第一版出版于 2022 年
First edition published 2022
作者:CRC Press
by CRC Press
6000 Broken Sound Parkway NW, Suite 300,博卡拉顿,FL 33487-2742
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
CRC Press 供稿
and by CRC Press
4 Park Square,米尔顿公园,阿宾登,奥克森,OX14 4RN
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2022 Robert H. Chen 和 Chelsea Chen
© 2022 Robert H. Chen and Chelsea Chen
CRC Press 是 Taylor & Francis Group, LLC 的子公司
CRC Press is an imprint of Taylor & Francis Group, LLC
我们已尽合理努力发布可靠的数据和信息,但作者和出版商不能对所有材料的有效性或使用它们的后果承担责任。作者和出版商已尝试追踪本出版物中复制的所有材料的版权所有者,如果未获得以这种形式发布的许可,我们向版权所有者道歉。如果任何版权材料未得到承认,请写信告知我们,以便我们在将来的任何重印中纠正。
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
除美国版权法允许外,未经出版商书面许可,不得以任何电子、机械或其他方式(现在已知或以后发明的)重印、复制、传播或利用本书的任何部分,包括影印、缩微胶卷和录音,或在任何信息存储或检索系统中重印、复制、传播或利用本书的任何部分。
Except as permitted under US Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
如需复印或以电子方式使用本作品的资料,请访问www.copyright.com或联系 Copyright Clearance Center, Inc. (CCC),地址:222 Rosewood Drive, Danvers, MA 01923,电话:978-750-8400。如需 CCC 上未提供的作品,请联系mpkbookspermissions@tandf.co.uk
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk
商标声明:产品或公司名称可能是商标或注册商标,仅用于识别和解释,并不意图侵权。
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe.
美国国会图书馆出版品目錄數據
Library of Congress Cataloguing-in-Publication Data
姓名:陈,罗伯特·H.,1947- 作家。| 陈,切尔西,作家。
Names: Chen, Robert H., 1947- author. | Chen, Chelsea, author.
书名:人工智能:为好奇的读者提供的介绍/作者:Robert H. Chen、Chelsea Chen。
Title: Artificial intelligence : an introduction for the inquisitive reader / authored by Robert H. Chen, Chelsea Chen.
描述:博卡拉顿:CRC Press,2022 年。| 包括书目参考和索引。
Description: Boca Raton : CRC Press, 2022. | Includes bibliographical references and index.
标识符:LCCN 2021055530(印刷本)| LCCN 2021055531(电子书)| ISBN 9781032103471(精装本)| ISBN 9781032101842(平装本)| ISBN 9781003214892(电子书)
Identifiers: LCCN 2021055530 (print) | LCCN 2021055531 (ebook) | ISBN 9781032103471 (hardback) | ISBN 9781032101842 (paperback) | ISBN 9781003214892 (ebook)
主题:LCSH:人工智能。
Subjects: LCSH: Artificial intelligence.
分类:LCC Q335 .C4845 2022(印刷版)| LCC Q335(电子书)| DDC 006.3--dc23/eng/20211116
Classification: LCC Q335 .C4845 2022 (print) | LCC Q335 (ebook) | DDC 006.3--dc23/eng/20211116
LC 记录可在https://lccn.loc.gov/2021055530上查阅
LC record available at https://lccn.loc.gov/2021055530
LC 电子书记录可在https://lccn.loc.gov/2021055531上找到
LC ebook record available at https://lccn.loc.gov/2021055531
ISBN: 978-1-032-10347-1 (精装)
ISBN: 978-1-032-10347-1 (hbk)
ISBN: 978-1-032-10184-2 (平装)
ISBN: 978-1-032-10184-2 (pbk)
ISBN: 978-1-003-21489-2 (电子文档)
ISBN: 978-1-003-21489-2 (ebk)
在 Minion 中排版
Typeset in Minion
由 MPS Limited, 德拉敦提供
by MPS Limited, Dehradun
我本书以“人机对决”的趣味竞赛(如 IBM Deep Blue 对决 Garry Kasparov 和 Google AlphaGo 对决 Lee Sedol)为例,讲述了人工智能 (AI) 的探索历程,但人工智能的真正意义在于机器算法背后的人类思想以及机器未来可能具备的能力。人工智能的发展始于机械计算,可以看作是计算机科学的兴起推动的技术自然进步。有了硬件、软件和通信技术,人们开始认真探索人们梦寐以求的“专家系统”,从纯逻辑为基础的“自上而下”机器开始,输入公理,输出数学定理,但由于纯逻辑本身的矛盾以及数据和计算机能力的缺乏,逻辑理论家被大数据和大规模并行处理机器所取代,即所谓的“自下而上”方法。
IN THIS BOOK, THE ADVENTURES IN THE QUEST FOR artificial intelligence (AI) are exemplified by the entertaining demonstrations of “man versus machine” competitions such as IBM Deep Blue versus Garry Kasparov and Google AlphaGo versus Lee Sedol, but the real significance of AI evolves from the human ideas behind the machines’ algorithms and what the machine may be capable of in the future. Starting with mechanical calculation, the development of artificial intelligence can be seen as a natural progression of technology abetted by the generation of computer science. With the hardware, software, and communications in hand, the quest for the long-dreamed of “expert system” began in earnest with the pure logic-based “top-down” machine where axioms go in and mathematical theorems come out, but because of the inherent contradictions in pure logic and lack of data and computer power, the Logic Theorist was replaced by Big Data and massively parallel processing machines, in the so-called “bottom-up” approach.
自下而上的人工智能模仿人类大脑的结构和模式识别能力,利用电激活的人工神经元在并行处理计算机中形成的人工神经网络中形成识别和“思考”的突触模式。突触模式由神经元激活的马尔可夫链模型产生,神经元根据训练集加权,这一过程称为参数化。向量微积分的梯度下降最小化了机器模式和基本事实之间的差异,使用微积分的链式法则反向传播差异提供了机器学习,超参数化可以微调机器“思考”过程的准确性和计算效率。
Bottom-up AI mimics the structure and pattern recognition capability of human brains with electrically activated artificial neurons forming synaptic patterns of recognition and “thought” in an artificial neural network formed in a parallel-processing computer. The synaptic patterns are produced by Markov chain modeling of neuron activation with the neurons weighted in accord with a training set in a process called parameterization. The gradient descent of vector calculus minimizes the difference between the machine patterns and the ground truth, backpropagating the differences using the chain rule of calculus provides the machine learning, and hyperparameterization fine-tunes the accuracy and computational efficiency of the machine’s “thinking” process.
卷积神经网络在计算机视觉中的应用,借助大数据和大规模并行处理,已经将机器视觉带入了工业和社会的几乎每个方面。
The example of convolutional neural network used in computer vision armed with Big Data and massively parallel processing has brought machine vision into almost every facet of industry and society.
预测分析人工智能几乎无处不在,尤其是在商业、科学、政治和军事领域。人工智能强化通过学习,机器无需事先了解游戏规则就能击败顶级人类视频游戏玩家,而在知识不完善的现实世界中,人工智能系统已经击败了德州扑克比赛中的最强者。
Predictive analytics AI is employed almost everywhere, especially critically in business, science, politics, and the military. AI reinforcement learning has produced machines that can best the top human video gamers without even a priori knowing the rules of the games being played, and in the real world of imperfect knowledge, AI systems have beaten the best in Texas Hold’em poker competition.
由于语音的模糊性和变化性,使用生成循环神经网络的自然语言处理可以根据之前说过或推断的内容来评估即时语音,是目前最有前途的方法。
Because of the ambiguities and vagaries of speech, natural language processing using generative recurrent neural networks that can assess immediate speech in terms of what was spoken or inferred before are presently the most promising approaches.
在人类与人工智能之间的差异这一主题中,描述了实现算法背后的基本思想和机器学习的过程,其中的关键问题是机器进行数学运算的能力。
The basic ideas behind the implementing algorithms and the processes enabling a machine to learn are described within the theme of the difference between human and artificial intelligence, with the touchstone question being the capability of machines to do mathematics.
然而,读者所需的能力只是基本的微积分,并且通过示例解释了人工智能算法的特定方程,以便那些从未学习过或早已忘记微积分的人可以轻松理解算法背后的思想,希望能激发对数学在人工智能中的作用的认识。
That ability required by the reader, however, is only basic calculus, and explanations of the particular equations for artificial intelligence algorithms are provided with examples so that for those who have never learned, or have long forgotten their calculus, the ideas behind the algorithms can be easily understood, hopefully kindling an appreciation of the role of mathematics in artificial intelligence.
至于形式,一些数学表达方法,例如粗体表示向量和花括号矩阵,只在需要清晰表达时才使用,因为它们的身份从上下文中显而易见。措辞和拼写采用美式风格,但使用英式标点符号,在句子中留下引号和括号,而它们只是句子的一部分,后果是暴露句号或逗号。
As for form, some methods of mathematical exposition, for example the boldface representations of vectors, and curly letter matrices are employed only when necessary for clarity, as their identities are evident from the context. Wording and spelling are in the American style but with British punctuation leaving quotes and parentheses inside sentences of which they are only a part, at the dreadful consequence of laying bare the period or comma.
仅介绍算法的设计,对于那些希望了解代码的人,可以参考相关文章,免费的代码托管平台和优秀的在线编程教程提供了极大的便利和实际的编码体验。
Only the designs of the algorithms are presented, for those wishing to know the code, referrals are made to the relevant articles, and free code hosting platforms and excellent online programming tutorials offer great convenience and hands-on coding experience.
作者要感谢 Taylor & Francis Publishing 的 Callum Fraser 的专业指导、Mansi Kabra 对手稿的专业处理以及提供出色批评建议的手稿审阅者。
The authors would like to thank Callum Fraser of Taylor & Francis Publishing for professional guidance, to Mansi Kabra for expert handling of the manuscript, and the reviewers of the manuscript who provided excellent critical suggestions.
赫他有两个共同的名字,亚瑟·塞缪尔,出生并成长于美国中部堪萨斯州恩波利亚的一个中产阶级家庭,毕业于当地的恩波利亚学院,他看上去是一个在传统美国社会长大的普通年轻人。然而,和蔼可亲但生性谨慎的亚瑟·塞缪尔很快就被誉为一个不寻常的新领域的杰出人物。因为从恩波利亚毕业后,他的特殊才能使他进入了工程教育的堡垒麻省理工学院,为他提供了坚实的电气工程基础,然后进入了技术创新的源泉 AT&T 贝尔实验室,在那里他研究了改变整个世界的电信系统,以及对第二次世界大战胜利至关重要的雷达技术。
He was a man with two common first names, Arthur Samuel, born and raised in mid-America Emporia Kansas in a middle-class family, a graduate of the local College of Emporia, he appeared to be an ordinary young man brought up in a traditional American society. However, the genial but inherently cautious Arthur Samuel was soon to known as a man of great distinction in an unusual new realm. For from Emporia, his special talents allowed him to enter the citadel of engineering education MIT which provided him a sound electrical engineering basis, and thence to the fount of technical innovation AT&T Bell Labs where he worked on the telecommunications systems that transformed the whole world, and the radar technology critical to victory in World War II.
他的贡献十分重大,但他并未止步于此。战后,塞缪尔密切关注新技术的发展轨迹,来到伊利诺伊大学,研究巨型 ILLIAC 科学计算机。被 IBM 聘用后,他全身心投入到世界上第一台商用大型计算机的设计中,这台计算机很快就颠覆了世界各地的工业和教育。但就在那时,他提出了一个大胆的想法,这一想法将成为变革性技术的先驱,进一步颠覆了工业和几乎所有世界人民的生活。他的想法受到他回归平民本源的启发,他是一个“思考机器”的魔术师,玩着一个几乎每个美国人都知道和玩过的琐碎游戏。
His contributions were significant, but he was not done, for after the War, closely tracking the locus of new technology, Samuel arrived at the University of Illinois to work on the giant ILLIAC scientific computer, and after being recruited by IBM, he was immersed in the design the world's first commercial mainframe computers that were soon to disrupt industry and education all over the world. But it was then that he came up with an audacious idea that would be the precursor of a transformative technology that would further disrupt industry and the lives of almost all the peoples of the world. His idea was inspired by a return to his everyman roots as the conjurer of a “thinking machine” playing a trivial game known and played by almost everyone in America.
1949 年,在波基普西工作的阿瑟·塞缪尔 (Arthur Samuel) 注意到,新型 IBM 701 计算机的存储和显示矩阵看起来就像一个棋盘,和当时几乎所有美国人一样,他也玩着这种常见的游戏,即在棋盘上跳跃以捕获对手的棋子并“夺取”自己的棋子。
Working in Poughkeepsie in 1949, Arthur Samuel noted that the storage and display matrices of the new IBM 701 computers, looked just like a checkerboard, and like almost everybody in America at the time, he played that common game of jumping about a board to capture an opponents’ pieces and “kinging” your own pieces.
然而,凭借着他不凡的头脑,塞缪尔想到了一个聪明的方法,可以为 IBM 树立一个形象,即打造一台神奇的跳棋计算机的创造者,这台计算机可以在人类的游戏中挑战并击败人类,从而为 IBM 计算机的非凡功能赢得宣传。
In his uncommon mind however, Samuel saw a clever way to create an image for IBM as the creator of a wondrous checkers-playing computer that could challenge and defeat humans at their own game, garnering publicity for the marvelous capabilities of IBM computers.
这显然是一个富有创意的营销方案,但他的公司却丝毫不支持,因为其德高望重的董事长 Thomas Watson, Sr. 与 Samuel 一样注重 IBM 的形象,但他却相反,他要避免出现一个可怕的“IBM 思考机器”的幽灵在全国各地袭击人类。
It was clearly a creative marketing scheme, but his company was not in the least supportive, for its venerable chairman Thomas Watson, Sr., like Samuel was conscious of IBM's image, but in the opposite sense of avoiding the spectre of a menacing “IBM thinking machine” going around beating up on humans all over the country.
让朴素的 IBM Selectric打字机和办公机器像弗兰肯斯坦的怪物一样经过改装,在跳棋比赛中战胜小镇居民,这一想法与沃森老先生对 IBM 产品的营销愿景——友好的助手——格格不入。
The idea of the homey IBM Selectric typewriters and office machines souped-up like Frankenstein's monster to triumph over small-town residents at checkers was anathema to the friendly helpmate that was Watson Sr.'s marketing vision for IBM products.
像维克多·弗兰肯斯坦这位认真的科学家一样,顽强的塞缪尔追随他的棋盘灵感,利用业余时间开发了 701 台跳棋机,超越了 IBM 的强大能力。他编写了游戏规则,并编写了一个决策树移动生成器,通过将数值附加到最有可能带来预期优势的移动选项上来评估移动。评估中包括任何跳棋玩家都熟悉的跳棋智慧,例如在领先时通过均等挑战耗尽对手的棋子、简单的策略和诡计,这些在今天被称为启发式。
Like Victor Frankenstein the earnest scientist, the dogged Samuel followed his checkered muse, jumping over the IBM powers by developing the 701 checkers machine on his own time. He encoded the rules of the game and programmed a decision tree move generator, evaluating moves by attaching numerical values to the move options that would most likely lead to a desired advantage. Included in the evaluations were nuggets of checkers wisdom familiar to any checkers player, for instance depleting the opponent's checkers by even challenges when ahead, simple gambits, and guile that today are called heuristics.
但是,这种从规则、简单的决策评估器和众所周知的启发式方法中死记硬背的学习,只能将 701 台跳棋机提升到其创造者的水平,而对于塞缪尔来说,尽管他的技术非常出色,但他的水平仍然低于平均水平,因此他向专家玩家咨询更高级的制胜策略和战术。
But this rote learning from rules, simple decision evaluators, and well-known heuristics would take the 701 checkers machine only to the level of its creator, which in Samuel's case, despite his technical brilliance, was below average, and so he consulted expert players for more advanced winning strategies and tactics.
然而,塞缪尔发现很难融入他们的跳棋技巧,因为主要是基于对对手的倾向和特性的了解,而不是很难编码的“感觉”和彻底的猜测。
Samuel found it difficult however to incorporate their checkers skill, being mostly based on knowledge of the proclivities and idiosyncrasies of opponents, and no little hard-to-codify “feel” and outright guessing.
由于这种“自上而下”的规则编码、树搜索结果、启发式方法以及向一些专家玩家学习,尽管现在是“专家”,701 可以除了和那些玩家一样好之外,塞缪尔还清楚地知道,701 专家系统需要通过与其他专家玩家的对弈来学习,并且像人类玩家一样,701 可以“自下而上”地积累游戏技能,并从自己的实际游戏经验中,通过强化好的举动选择和降低坏举动来获得如何获胜的知识。
Since this “top-down” encoding of rules, tree search results, heuristics, and learning from some expert players, although now “expert”, the 701 could only as well as those players, it was clear to Samuel that the 701 expert system needed to learn from playing against other expert players, and like a human player, 701 could accumulate playing skills “bottom-up” and from its own actual game experiences, gain knowledge on how to win by reinforcing its good move choices and degrading the bad moves.
现在,塞缪尔会根据专家棋手的比赛记录,在训练中最终成功或失败来评估棋步,并在与人类棋手的实际比赛中,甚至在与自己的比赛中,对好棋步和坏棋步进行奖励和惩罚。这三种学习方案后来分别被称为训练集上的监督学习、游戏情境中的强化学习和通过逐步改进自身版本进行的无监督学习,这是当今“机器学习”的基本方法,这是塞缪尔自己创造的术语。
Samuel now evaluated moves based upon their ultimate success or failure in training sessions based on the recorded matches of expert players, and the reward and punishment of good and bad moves in actual games against human players, and even in games played against itself. These were three learning schemes that would later be called respectively supervised learning on training sets, reinforcement learning in game situations, and unsupervised learning by playing progressively improving versions of itself, the basic methods of today's “machine learning”, a term Samuel himself coined.
通过机器学习,IBM 701 跳棋机慢慢改进,经过 13 年的兼职开发,于 1962 年向康涅狄格州冠军罗伯特·W·尼利发起挑战并轻松击败了它。
By means of machine learning, the IBM 701 checkers machine slowly improved, and in 1962 after 13 years of part-time development, Samuel's machine challenged and easily defeated the Connecticut state champion Robert W. Nealey.
赛后,这位前冠军表示,自从1954年输掉最后一场比赛后,他就再也没有遇到过这样的对手,但在表现出成功人士特有的悔恨自豪的同时,他也谨慎地表达了对IBM 701“智能”的明确认可。
After the match, the former champion said he had not had such competition from anyone since 1954, when he lost his last game, but in the rueful pride characteristic of accomplished human beings, he also circumspectly exhibited a clear approbation of the IBM 701's “intelligence”.
尽管取得了成功,但下跳棋的 IBM 701 却没有这种有害的自尊心,更不用说谨慎了,但是,如果自尊心会激发成功的意志,从而产生更大的努力,那么缺乏这种自尊心的机器可能会在决心上不如人类,但它在努力上是无与伦比的,因为它只需要电力就可以全天候不知疲倦地练习,不仅与人类对抗,还与其他机器和自己对抗。
Despite its successes, the checkers-playing IBM 701 had no such baneful pride, much less circumspection, but if pride instigates the will to succeed, thereby producing greater effort, a machine lacking such pride might make it inferior to humans in determination, but it has no equal in effort as it needs only electricity to practice tirelessly 24/7 against not only humans but other machines and itself.
尽管自尊心能激励人心,但一旦被打破,就会演变为偏执,正如伟大的国际象棋冠军加里卡斯帕罗夫后来在与 IBM 的“深蓝”计算机的激烈决斗中所揭示的那样。
And pride although motivating, if once broken can devolve to paranoia, as the great chess champion Garry Kasparov would later reveal in his acrimonious duel with IBM's Deep Blue.
在随后的几年里,塞缪尔冷静沉着的 IBM 701 有充分的理由感到自豪,因为它连续 15 年保持不败,直到 1977 年最终失败,不过不是输给人类,而是输给了杜克大学开发的竞争对手跳棋程序。
Samuel's cool and collected IBM 701 in the ensuing years would have every reason to be proud, for it remained undefeated for 15 years until 1977 when it finally lost, not to a human, but to a rival checkers program developed at Duke University.
骄傲的人类如何将失败内化已经得到分析,但机器如何将失败内化可能永远不得而知,因为在当今人工智能机器的人工神经网络的隐藏层深处,“思想”的萌发和处理在很大程度上是深不可测的,甚至对于学习算法的创建者来说,这个未知的,与通常使计算机优于有情感的人类的冷酷逻辑的计算机精神相反,可能会为机器的态度、任性、甚至情感留下空间。
How prideful humans internalize defeat has been analyzed, but how a machine internalizes a defeat may never be known, for deep within the hidden layers of the artificial neural networks of today's AI machines, the germination and processing of a “thought” is largely unfathomable even to the learning algorithm's creator, and that unknown, contrary to the coldly logical computer ethos that usually makes it superior to an emotional human being, could conceivably leave room for machine attitude, willfulness, and even emotion.
计算机和人工智能的福祉在于,机器能够提高产量,同时将人们从日常繁琐的工作中解放出来,让人们能够思考工作,而不是仅仅忍受乏味的工作,从而提高效率,留出更多时间去追求崇高的目标和享受生活。机器的祸害在于,它有可能接管几乎所有人类的职业,将人类的活动交给机器来照顾和喂养。
The blessings of computers and artificial intelligence have been the machine's potential to increase production while freeing people from the drudgery of everyday work, allowing them to think about the work rather than just enduring the tedium of doing it, thus improving efficiency and leaving more time to pursue lofty goals and enjoy life. The bane of the machine is its potential to take over almost all human occupations, relegating humans’ activities to the care and feeding of the machines.
可能与人类的长期自身利益相悖,人类组织为了智力或商业利益,在人类智能的两个主要标志的领域发起了人与机器的最高智力战斗的公平比赛,即IBM在西方国际象棋中的大挑战和谷歌的DeepMind进军古老的东方游戏围棋。
Possibly at odds with humanity's long term self-interest, organizations of humans for intellectual or commercial gain have initiated Man vs. Machine fair matches of supreme mental combat in the arenas of two of the primary indicia of human intelligence, IBM's Grand Challenges in Western chess and Google's DeepMind foray into the ancient Eastern game of Go.
智力体现在游戏规则范围内战略战术思维的较量,优势体现在走法的合理性、创造性以及致胜的结果。
Intelligence can be manifested in contests of strategic and tactical thinking within a game's metes and bounds, with superiority demonstrated by the rationality and creativity of moves that produce successful outcomes.
从老沃森到小沃森,IBM 对思考机器的态度发生了逆转。在托马斯·沃森的领导下,IBM 电子计算机的下一步自然是将机器的知识范围从简单的跳棋扩展到复杂的国际象棋。在民众中引起恐惧并不是他关心的问题,寻求钦佩以及随后为 IBM 带来收入才是目标。
From Watson Senior to Watson Junior, IBM's attitude towards thinking machines reversed; for under Thomas Watson, Jr., the natural next step for IBM's electronic computers was to extend the machines’ ken from simple checkers to sophisticated chess; engendering fear among the populace was not a concern to him, seeking admiration and subsequent income for IBM was the goal.
1996 年,由卡内基梅隆大学研究生 FH Hsu 设计、IBM 团队进一步开发的国际象棋计算机Deep Thought向被公认为国际象棋史上最伟大棋手的加里·卡斯帕罗夫发起了六局冠军挑战赛。1
In 1996, the chess computer Deep Thought designed by Carnegie Mellon University graduate student F.H. Hsu, and further developed by his team at IBM, grand challenged Garry Kasparov, generally acknowledged as the greatest player in the history of the game, to a six-game championship challenge match.1
卡斯帕罗夫在第一场比赛中以 4-2 战胜了“深思”,但第二年 1997 年 5 月在纽约,与卡斯帕罗夫对决的是升级版 IBM 大规模并行 RS/6000 SP 超级工作站“深蓝”国际象棋机,该机配备了新开发的加速器芯片组。2
Kasparov won the first match against Deep Thought 4-2, but the next year in New York City in May 1997, arrayed against Kasparov was the upgraded IBM massively-parallel RS/6000 SP Super Workstation Deep Blue chess-playing machine, replete with newly-developed accelerator chip sets.2
Deep Blue 专门设计的高性能硬件和软件可以以每秒 2 亿步的速度对 500 亿种可能的位置进行极小极大树搜索。在对搜索树进行alpha-beta 剪枝后,搜索树深度为 6 到 8 步以选择最佳动作。3
Deep Blue's specifically designed high-performance hardware and software could minimax tree-search 50 billion possible positions at a rate of 200 million moves per second. After alpha-beta pruning of the search tree, a tree-depth of six to eight moves was searched to select optimum moves.3
卡斯帕罗夫(执白棋)用“反计算机”策略赢得了第一局,即故意采取次优步骤来迷惑理性的计算机;这在第一局比赛中似乎奏效了,他以白棋优势获胜,但在第二局惨败后,他的信心动摇了,之后他指责深蓝队在比赛中非法进行人为干预。IBM 对此予以否认,称人为调整只是在比赛之间进行的,符合规则。深蓝以 3½-2½ 赢得了比赛,这是历史上第一次机器在锦标赛中击败特级大师。
Kasparov (playing white) won the first game with an “anti-computer” strategy where deliberately suboptimal moves are made to confuse the rationally-wired computer; this seemed to work in the first game which he won with white advantage, but his confidence was shaken upon a devastating second game loss, after which he accused the Deep Blue team of illegal in-game human intervention. IBM denied this, saying that adjustments by humans were made only between the games, in accord with the rules. Deep Blue won the Match 3½-2½, the first time in history that a machine had defeated a Grandmaster in a Championship competition.
事后,深蓝的比赛日志确实显示第一局比赛中出现了随机错误,评论员们猜测卡斯帕罗夫将随后的修正解读为深蓝团队在第二局比赛中所做的修改;换句话说,卡斯帕罗夫不会接受自己会被机器打败的事实。
Afterwards, Deep Blue's game logs did reveal a random error in Game 1, and commentators speculated that Kasparov interpreted the subsequent fixes instead as Game 2 in-game changes by the Deep Blue team; in other words, Kasparov would not accept that he could be beaten by a machine.
比赛的激烈程度并不足以决定机器是否优于人类,但卡斯帕罗夫自始至终的偏执,以及他在第 6 场比赛中的惨败至少可以证明,深蓝冷酷的逻辑可以战胜人类情感的温暖脆弱以及极具自我意识的人类的骄傲。
The closeness of the match was not definitive of the superiority of machine over man, but Kasparov's paranoia throughout, and his abysmal resignation in Game 6 could at least establish that Deep Blue's cold logic could triumph over the warm frailty of human emotion and the pride of extremely self-aware human beings.
在取得惊人胜利之后,深蓝计算机是否同样意识到了自身的优越性?这个问题永远不得而知,因为大多数创新研究设备的命运都是解剖;RS/6000 SP 被送回 IBM 的测试场地,外壳也退回了,但两张卡被送到了 IBM 位于阿蒙克的总部进行参观演示,其余的则被送回了 IBM 的实验室。被插入旧版本RS/6000 SP并分散到各个工作站和零件架上。
After its stunning victory, could Deep Blue the machine likewise be self-aware of its superiority? It will never be known because the fate of most innovative research devices is dissection; the RS/6000 SP was sent back to IBM's test floor and the shell returned, but two cards went to IBM headquarters in Armonk for visitor demonstrations, and the rest were inserted into the older version RS/6000 SP and dispersed to various workstations and parts shelves.
在赛后的新闻发布会上,卡斯帕罗夫受到了众多棋手、专业评论员、媒体和公众的欢呼和热烈鼓励,而当IBM的“深蓝”团队登上舞台时,尽管拥有引人瞩目的技术成就,却遭到了不加掩饰的鄙夷和窃窃私语。
In the press conference after the match, Kasparov was cheered and heartily encouraged by an audience including many chess masters, expert commentators, the press, and the general public, but when IBM's Deep Blue team assembled on the stage, their notable technical achievement notwithstanding, they were met with thinly-veiled disdainful murmuring.
为人类挺身而出对抗机器并没有什么罪过,但深蓝的胜利却引发了不安、恐惧,甚至敌意,观众显然感觉到的是威胁而不是希望。也许老沃森说得对。
There is no sin in standing up for humankind against a machine, but Deep Blue's victory evoked unease, fear, even hostility, and the audience apparently sensed menace rather than hope. Perhaps Watson Senior was right after all.
这次挑战之后,比分是机器 2,人类 0,在尼利骄傲地承认失败之后,卡斯帕罗夫因傲慢而情绪崩溃,预示着心理弱点可能会或永远不会在机器上表现出来。
After this challenge, the score was Machine 2, Humans 0, and after Nealey's prideful but acknowledged defeat came Kasparov's arrogance-fed emotional collapse, presaging psychological frailties that may or may not ever manifest themselves in a machine.
西方明确的“杀王”象棋精神在曼哈顿得到了充分展示,机器击败了处于巅峰状态的人类冠军;东方隐含的“包围和征服”棋盘游戏围棋大师会遭遇同样的命运吗?
The West's explicit “kill the King” chess ethos was in full display in Manhattan with the machine defeating the human champion at his peak; would the masters of the East's implicit “surround and conquer” board game of Go meet the same fate?
2014年,新科技巨头谷歌收购了英国的DeepMind,以挑战来自欧洲、日本、韩国和中国的围棋大师,这些大师在东亚社会中被崇拜为最崇高的分析天才。
In 2014, the new tech giant Google acquired Britain's DeepMind to challenge Go Masters from Europe, Japan, Korea, and China, paragons worshipped in East Asian societies as exalted members of the most sublime class of analytical geniuses.
19×19 的棋盘和可能的 361 种棋子布局,只需要用你的棋子包围你的领地和对手的棋子即可;2× 10170种可能位置的下限是一个惊人的数字,甚至大于宇宙中所有原子的总和。面对如此令人生畏的静态决策数量和围棋大师的天才级创造力,计算机科学家(其中许多人都是狂热的围棋玩家)一直认为,大师级的围棋计算机是绝对不可能的。4
The 19 × 19 board and possible 361 stone placements are simply played by capturing territory with your stones surrounding territory and your opponent's stones; the lower bound of 2 × 10170 possible positions is a prodigious number that is indeed greater than the sum of all the atoms in the Universe. Faced with such a daunting number of static decisions and the genius level creativity of Go Masters, computer scientists, many of them avid Go players, had always believed that a Masters-level Go playing computer was an absolute impossibility.4
尽管如此,早期的围棋计算机使用简化的树搜索技术,可以进行业余水平的对弈,但在2015年,谷歌的AlphaGo首次在未经让分的正式比赛中战胜了专业棋手、欧洲冠军樊麾,随后又击败了日本传奇顶级棋手井山勇太,第二年,在一场备受瞩目的比赛中,AlphaGo以4-1击败了当时的卫冕世界冠军韩国选手李世石,赢得了100万美元的奖金(谷歌慷慨地将其捐给了慈善机构)。
Nevertheless, early Go computers using simplified tree searches were developed that could play at an amateur level, but in 2015, for the first time, Google's AlphaGo won a sanctioned match without a handicap against a professional, the European champion Fan Hui, and then went on to defeat Japan's legendary top player Iyama Yuta, and the next year, in a highly-publicized match, AlphaGo defeated the then reigning world champion Korea's Lee Sedol 4-1, winning a US$1 million prize (which Google graciously donated to charity).
AlphaGo 唯一一场输给李世石的比赛被归咎为由于未能理解李世石“神棋”白78 导致的“错觉”;该错误在 2017 年得到修复,新升级的AlphaGo Master在与人类和围棋计算机对手的比赛中取得 60 比 0 的连胜后,在中国乌镇与新的世界第一、18 岁的神童柯洁对决,乌镇被认为是围棋的发源地。围棋是日文名称为Go的围棋的中文名称。
AlphaGo's sole game loss to Lee Sedol was attributed to a “delusion” resulting from incomprehension of Lee's “divine” move white 78; it was repaired in 2017, and the new improved AlphaGo Master, after a 60–0 win streak against humans and rival Go-playing computers, took on the new World's No. 1, the 18-year old prodigy Ke Jie in Wuzhen, China, believed to be the birthplace of weiqi, the Chinese name of the Japanese-named Go.
柯洁一直不愿意接受 AlphaGo Master 的挑战,并不是因为害怕输给机器,而是因为他担心它会“模仿我的风格”,事实上,AlphaGo Master 已经接受了高水平围棋比赛出版物的监督训练,并向专业选手学习了如何下棋,与中国年轻天才的比赛很可能是另一次学习的机会。AlphaGo Master 轻松以 3-0 获胜并获得 150 万美元的奖金。5
Ke Jie had been reluctant to accept AlphaGo Master's challenge, not as he said from fear of losing to a machine, but rather because he was afraid it would “copy my style” and indeed, AlphaGo Master had undergone supervised training on high-level Go match publications, and learned how to play from professionals, and its match against China's young genius very likely was a chance for another learning experience. AlphaGo Master easily won 3–0 and a US$1.5 million prize.5
欧洲、日本、韩国和中国的专业围棋协会均授予AlphaGo最高九段认证,从而将围棋大师的终极段位赋予了机器。
Europe, Japan, Korea, and China's professional Go associations all awarded the highest 9-dan certification to AlphaGo, thus bestowing a Go Master's ultimate rank to a machine.
柯洁在与AlphaGo Master的对决后,将自己的失败归咎于AlphaGo Master的“非人类下法”,并感慨道:“人类花了数千年的时间改进战术,却被机器说错了,我们只是触及了围棋本质的表面而已。 ” 6
Ke Jie, after the match with AlphaGo Master, attributed his defeat to its “non-human playing style”, adding a doleful prospection, “After humanity spent thousands of years improving our tactics, a machine tells us that we were completely wrong; we have only scratched the surface of the essence of Go”.6
言外之意很明显,机器比人类更有可能发现围棋策略中的更多奥秘,因此在这个领域柯洁意识到,机器的智力已经比最好的围棋大师还要高超。7
The implication was clear, the machine was more likely to discover the further mysteries in Go strategy than humans, and therefore in this realm of intelligence, Ke Jie recognized that the machine was already more proficient than the best Go Masters.7
韩国选手李世石表示:“第一局比赛结束后,我对自己输掉比赛感到很意外,但从第二局比赛一开始,我就没能占到任何便宜。这是 AlphaGo 的全面胜利。”为了安慰全世界 2 亿关注这场比赛的观众,他后来又补充道:“这是我的失败,而不是人类的失败……” 8
Korea’s Lee Sedol said, “After the first game, I was surprised I lost, but from the very beginning of the second game, I could never manage an upper hand for one single move. It was AlphaGo's total victory”. In a kind but futile attempt at reassurance to the some 200 million humans all over the world who followed the match, he later added that “it was my defeat, not a defeat of mankind….”8
虽然这些话很客气,但对于李世石来说,这些话在当时和现在都显得空洞无力。李世石是一位神童,12 岁就获得职业排名,19 岁就赢得了第一个冠军,从 2002 年到 2015 年一直是世界排名第一的棋手,赢得了 18 个世界冠军,他曾经是、现在仍然是韩国人尊敬的民族英雄,在整个围棋界都享有盛誉。他在与机器的比赛中高尚地代表了人类,但他输了……
Although gracious, such words ring hollow then and now, for Lee Sedol, a child prodigy who gained professional rank at 12, won his first championship at 19, and was the No. 1 player in the world from 2002 to 2015, winning 18 world championships, was and is a national hero revered in Korea, and celebrated throughout the Go-playing world. He represented mankind nobly in a match with a machine, and he lost ….
李世石在 2017 年退役,他认真承担了败给机器的全部责任,震惊不已,他说,无论他多么努力,“还有一个无法击败的实体”。然而,他像柯洁一样若有所思地补充道:“机器人永远不会像我们人类一样理解游戏的美妙之处”,在输掉比赛时表现出一点不满,但与卡斯帕罗夫不同,他平静地只质疑 AlphaGo 的深奥鉴赏能力,而不是它的下棋技巧。9
Earnestly taking full blame for defeat at the hands of a machine, a shaken Lee Sedol retired in 2017 saying that no matter how hard he might try, “there is another entity that cannot be defeated”. He however like Ke Jie wistfully added that “robots will never understand the beauty of the game as we humans do”, displaying a little pique in losing, but unlike Kasparov, with an equanimity that only questioned AlphaGo's esoteric appreciation and not its playing skill.9
李世石与柯洁的对弈,体现出以卡斯帕罗夫为代表的东方的接受度与西方的抗拒度之间的文化差异,意味着东北亚地区对于人工智能的接受度将比西欧和美国更加顺利。
Lee Sedol and Ke Jie thereby revealed a cultural difference between an accepting East and a recalcitrant West represented by Kasparov that portends smoother acceptance of artificial intelligence in Northeast Asia than in Western Europe and America.
AlphaGo 的人工神经网络是一个19 × 19 × 48体积矩阵输入层,由人工神经元和 13 个完全连接到softmax整流器的滤波器卷积隐藏层组成。AlphaGo 的监督学习使用基于已发布的专业围棋比赛的训练集,采用蒙特卡洛树搜索和 alpha-beta 剪枝来减少可能的移动决策数量,从而产生能够揭示各种棋盘模式的最佳移动的播放模拟。
AlphaGo's artificial neural network was a 19 × 19 × 48 volume matrix input layer of artificial neurons and 13 filter-convolved hidden layers fully connected to a softmax rectifier. Using training sets based on published professional Go matches, AlphaGo's supervised learning employed a Monte Carlo Tree Search with alpha-beta pruning to reduce the number of possible move decisions to produce playout simulations that would reveal the optimum moves for various board patterns.
经过这样的学习,第一个机器视觉卷积神经网络可以识别黑白棋子位置的不同模式,从而揭示出好的和坏的走法,形成真实比赛中AlphaGo的策略网络。
After this learning, different patterns of black and white stone positions could be recognized by a first machine vision convolutional neural network that revealed good and bad moves to form AlphaGo's policy network in a real match.
第二个卷积神经网络评估了这些举动对优化棋盘位置模式的贡献。这个价值网络可以揭示出最有可能获得最多领地的举动。
A second convolutional neural network evaluated the moves as to their contribution to the optimization of board position patterns. This value network could reveal the moves with the highest probabilities leading to acquiring the most territory.
但就像塞缪尔的跳棋程序一样,这只允许 AlphaGo 与训练集中最优秀的棋手下棋,而没有提供击败这些棋手的基础。因此,AlphaGo 开始进行严格的自我训练,通过与围棋大师和自己的不同版本进行强化学习,对好棋和坏棋进行奖励和惩罚,最终形成了不可战胜且不断改进的新版本,即AlphaGo Master。
But this, like Samuel's checkers program, only allowed AlphaGo to play as well as the best players in the training set, and did not provide a basis for beating those players. So AlphaGo began a rigorous regimen of self-training by playing games with reward and punishment for good and bad moves in reinforcement learning against Go Masters and different versions of itself, culminating in the unbeatable and constantly improving new version called AlphaGo Master.
英美 AlphaGo Master 现在可以通过不懈的自我学习来确定最佳棋盘位置策略,因为它不断增加最终战胜世界最强棋手的概率。这种勤奋自强的经典例子,通过与自己比较,形成了游戏美德,是典型的儒家思想,完全适合东北亚的围棋大师。
The Anglo-American AlphaGo Master now could through relentless self-study determine optimum board position strategies as it iteratively increased the probabilities of ultimate victory against the best player in the world, itself. This classic instance of diligent self-strengthening leading to a game-playing virtue through comparison with oneself is quintessentially Confucian, entirely proper for a Go master in Northeast Asia.
AlphaGo Master 在去掉监督训练模块后,就进化成了一个没有任何先验数据输入的纯粹思考机器。因此, AlphaGoZero可以像人类一样,自下而上地学习任何游戏,在游戏中学习规则,通过观察对手的大量动作和策略的结果来适应对手的各种动作和策略,然后从中形成一个策略和动作价值网络,从而取得胜利。
AlphaGo Master when stripped of its supervised training module evolved into a pure-play thinking machine with zero a priori data input. AlphaGoZero thus could learn how to play any game just like a human player learns, from the bottom-up, learning the rules while playing and adjusting to the wide variety of opponent moves and strategies by observing the results of many, many moves and strategies, and then forming a policy and value network of moves therefrom that would lead to victory.
这种除了在局部领土战斗中巧妙地夺取领土之外,不懈地最大化最终胜利概率的政策表明,对于 AlphaGoZero 来说,你如何获胜、赢了多少,甚至你如何玩游戏都不重要,唯一的目标就是获胜。
This policy of relentlessly maximizing the probability of final victory over and above winning particular instances of adroitly gaining territory in localized territorial fights indicates that for AlphaGoZero, it is not important how you win or by how much you win, or even how you play the game, the only goal is to win.
从人的角度看,李世石的“美妙之处”在于,他多次巧妙运用棋子,扰乱、包围、吃掉对手的棋子,从而在类似武术般的地盘争夺战中取得胜利。
From the human perspective, Lee Sedol's “beauty” of the game is in the many instances of the clever manipulation of stones to disrupt, hem in, surround and capture your opponent's stones to win the martial arts-like fights for territory.
与英超足球一样,“美丽的比赛”之所以吸引人,是因为球员们技艺高超,他们能够以灵巧的脚步和精准的传球巧妙地战胜防守队员,这些动作看起来很美妙,但很少有助于取得最后的胜利,而这往往是整体阵地政策的结果,通常是一种僵硬的防守,阻止了组织严密、令人愉悦的射门,却通过反击取胜。
Rather like Premier League football, the “beautiful game” is attractive because of the superb skills of individual players in instances of cleverly outfoxing defenders with deft footwork and precision passing, actions that are beautiful to watch but seldom contribute to a final victory, something that often is the result of an overall positional policy, typically a stultifying defense that thwarts well-organized and crowd- pleasing shots on goal, but wins through break-away counter attacks.
那么,AlphaGoZero 是否选择了一条不那么美观但效率更高的胜利之路?AlphaGoZero 是否因为无视美学吸引力而更清楚地感知到了游戏的客观胜负现实?人类是否能够或有意愿效仿它?人类是否有决心去追求 AlphaGoZero 的必胜但呆板的策略?如果有,这样的游戏是否值得一玩?
So has AlphaGoZero taken the less beautiful but more efficient path to victory? Does AlphaGoZero perceive the game's objective victory/defeat reality more clearly because of its disregard for its aesthetic appeal? Can or do humans desire to emulate it? Do humans have the resolve to pursue AlphaGoZero's winning but wooden policy, and if so, is such a game worth playing?
值得注意的是,一些专家评论员发现 AlphaGoZero 的下法比柯洁认为的 AlphaGo Master 的“不太人性化”的下法更像人类。也就是说,由于其优化算法的全面性,AlphaGo Master 产生了一种自信,认为自己的走法总是最好的,否则它不会做出这些走法。AlphaGoZero 可能会认为人类对手的走法很聪明,还不错,但仍然不如由算法严谨性、数据和不懈的自学产生的更优的比赛策略;也就是说,无论对手做什么,AlphaGoZero 都有更好的反击,它有终极的自信,因为在战胜了所有的围棋大师之后,AlphaGoZero 不禁感到自己更优越,并最终意识到这种优越性,这种骄傲和傲慢可能会产生一种对人类的危险态度。10
It is of interest to note that some expert commentators have found AlphaGoZero to be perversely more human in its playing rather than what Ke Jie thought of AlphaGo Master's “less human style”. That is, the result of the comprehensiveness of its optimizing algorithms was that AlphaGo Master developed a confidence that its moves were always the best, otherwise it would not have made them. AlphaGoZero might consider a human opponent's move clever, not bad, but nevertheless inferior to a response born of algorithmic rigor, data, and tireless self-study producing a superior match policy; that is, no matter what the opponent does, AlphaGoZero has a better counter, it has an ultimate confidence because after victories over all the Go Masters, AlphaGoZero cannot help but feel superior, and ultimately be quite aware of that superiority, a pride and arrogance that may produce a dangerous attitude toward humans.10
AlphaGoZero 的强化和无监督学习方案不能只是一遍又一遍地玩同一个版本,因为算法会对数据进行过度拟合,就像普通玩家一样,只会记住响应动作,而不会创造新的策略和战术。AlphaGoZero 之所以无敌,是因为不同版本的自身不仅依次成为世界上最优秀的,而且几乎无限循环的迭代改进意味着 AlphaGoZero 会不断提高自己的技能,以至于它永远不会被打败,并且在围棋方面变得无限优秀,无论这意味着什么,预示着什么,而且是不可战胜的,除非是其他以某种方式领先于 AlphaGoZero 学习曲线的机器。11
AlphaGoZero's reinforcement and unsupervised learning regimen could not just involve playing the same version of itself over and over again, because the algorithm would overfit the data and much like a pedestrian player, merely memorize responsive moves rather than creating new strategies and tactics. AlphaGoZero was invincible because different versions of itself were not only each in turn the best in the world, but the almost infinite loop of iterative improvement meant that AlphaGoZero would develop its skill such that it could never be defeated and become infinitely good at Go, whatever that means and portends, and unbeatable except by some other machine that somehow got ahead of AlphaGoZero's learning curve.11
到目前为止,人工智能机器已经挑战过冠军跳棋选手、天才象棋和围棋高手,并击败了他们。但这些胜利都是在高度受限的领域竞争中取得的。
So far, artificially intelligent machines have challenged champion checkers and genius chess and Go masters at their peak, and soundly beaten them all. But the victories were in highly constrained domain competitions.
那么日常生活的开放领域和争论这一人类的祸害又如何呢?太阳底下没有不能争论的话题或观点,每个人及其配偶都是专家,无论主题是什么。开放领域环境需要几乎任何主题的知识、快速的信息处理、创造性地构建论点、有说服力地阐述立场、快速理解对手的论点,然后分析解构和尖锐反驳对手的论点,所有这些都有参考文献和数据支撑,最终说服持怀疑态度的听众,通常是通过情绪化的演说、幽默和自信的表现。
How about the open domain of everyday life and that very human bane of arguing? There is no subject or opinion under the Sun that cannot be argued, and everyone and their spouses are experts, regardless of subject matter. Open domain environments require knowledge of almost any subject, rapid information-processing, creative construction of an argument, persuasive exposition of a position, quick understanding of the opposition's argument, and then analytical deconstruction and sharp rebuttal of the opponent's arguments, all buttressed by references and data, to finally persuade a skeptical audience, often by means of emotive elocution, humor, and a display of self-confidence.
2019年,首届辩论大赛在旧金山举行,题目是“政府是否应该资助太空探索”。IBM的辩论员以事实为依据,认为太空探索有利于人类,因为它有助于推动科学发现,并激励年轻人超越自我。
The first debating Grand Challenge was held in San Francisco in 2019, the proposition was: “Should the government subsidize space exploration”. IBM's Project Debater argued, supported by facts, that space exploration benefits humankind because it can help to advance scientific discoveries and inspires young people to think beyond themselves.
2016 年以色列全国辩论冠军诺亚·奥瓦迪亚 (Noa Ovadia) 反对该提案,认为政府补贴有更好的用途,比如直接用于地球上的科学研究。Project Debater 用历史事实反驳道,太空探索带来的潜在技术利益超过了大多数往往毫无结果的政府补贴支出。诺亚反驳道,太空探索的成本高昂,而且任何有用结果都不确定。
Noa Ovadia, the 2016 Israeli national debate champion, in opposition argued that there are better applications for government subsidies such as directly for scientific research here on Earth. Project Debater rebutted with historical facts that the potential technological benefits from space exploration outweigh most government subsidy spending that often leads nowhere. Noa countered by evoking the high cost of space exploration and the uncertainty of any useful results.
在规定的七分钟内,双方确实提出了简洁而又感性的论点,很难客观地判断哪一方更好。根据事先商定的辩论规则,辩论前会进行投票,赞成和反对该提案的辩手辩论结束后产生更多交叉投票的一方获胜。Project Debater 通过改变更多观众的想法而获胜。
In their allotted seven minutes, both sides indeed presented succinct and emotive arguments among which it would be difficult to objectively determine the better. In accord with the pre-agreed to debating rules, a pre-debate poll for and against the proposition is held, the debater who produces the greater number of cross-over votes after the debate is the winner. Project Debater won by changing the minds of a greater number of the audience.
在第二场辩论中,提案是“我们应该增加远程医疗的使用”。Project Debater 再次赢得了更多的交叉投票。面对一个之前未知的主题和提案,并且没有最初的立场选择,在专业辩论的深远、多样、动态的环境中,机器再次获胜。
In a second debate, the proposition was, “We should increase the use of telemedicine”. Project Debater once again won a greater number of cross-over votes. Faced with a previously unknown subject and proposition, and with no initial choice of side, in the far-reaching, diverse, dynamic environment of a professional debate, the machine had once again won.
次年,Project Debater 进行了改进,现在以合成女声命名为Miss Debater,并大胆挑战了 31 岁的剑桥联盟辩论官世界冠军 Harish Natarajan。提案是:“政府是否应该提供学前教育补贴?”Miss Debater 恭敬地向对手致意,但语带威胁,12
The next year, Project Debater was improved and now named Miss Debater in respect of its synthesized female voice, and boldly took on the 31-year old world champion Debating Officer of the Cambridge Union Society, Harish Natarajan. The proposition was: “Should the government provide pre-school subsidies?” Miss Debater greeted her opponent respectfully but tinged with a veiled menace,12
我听说你保持着与人类辩论比赛获胜的世界纪录,但我怀疑你以前从未与机器辩论过。欢迎来到未来!
I have heard that you hold the world record in debate competition wins against humans, but I suspect that you have never debated a machine before. Welcome to the future!
哈里什犹豫了一下,但还是礼貌地点头回应,激烈的辩论开始了。
Harish hesitated, but courteously nodded in response and the great debate began.
辩手小姐脸上反射性闪闪发光的三个蓝色球体诱惑地旋转着,她从包含 3 亿份报纸、科学期刊和其他文章的超过 100 亿个句子的数据库中进行搜索,然后一丝不苟但迅速地形成了一个论点。
Miss Debater's reflexively gleaming façade's three blue orbs tantalizingly rotated as she searched her database of over ten billion sentences from 300 hundred million newspaper, scientific journals, and other articles, and then meticulously but quickly formed an argument.
哈里什也在梳理自己的记忆,但他没有使用计算机内存读取器或微处理器来计算论据逻辑,而是忙着在记事本上潦草地记下笔记,概述自己立场的论据。
Harish was also combing his memory, but instead of computer memory fetches, and microprocessors churning logic for arguments, he was busily scrawling notes on a notepad outlining arguments for his position.
辩手小姐提供了研究数据和媒体引述,以表明补贴幼儿园不仅仅是一个财政问题,而且是保护社会上一些最脆弱儿童的道德和政治责任;同样,这是一个看似令人回味的论点,超越了实际问题,尽管不幸的是听起来有点像左翼政客的演讲。
Miss Debater produced research data and media quotes to show that subsidizing preschools is not just a matter of finance, but a moral and political duty to protect some of society's most vulnerable children; again, a seemingly evocative argument rising above practical matters, albeit unfortunately sounding a little like a left-wing politician's speech.
纳塔拉詹察觉到并利用了公众对政客的普遍蔑视,他反驳说,补贴往往是出于政治动机而向中产阶级提供的赠品,即使有补贴,许多人仍然无法负担孩子的学前教育;这是一种现实但有点愤世嫉俗的反驳,回应了观众听起来像“辩论小姐”中一位政客的悲惨言论。
Sensing and taking advantage of a general disdain for politicians among the public, Natarajan countered that too often subsidies function as politically-motivated giveaways to the middle class, and that even with subsidies there will be many who still cannot afford pre-school for their children; a realistic but rather cynical rebuke in response to what apparently sounded to the audience like a politician's bleeding heart jabber from Miss Debater.
尽管辩手小姐在辩论中偶尔会闪现出非机器般的幽默,但她很可能不会玩世不恭,而且也许是因为当时保守派自助的政治气氛相当严峻,哈里什作为人类赢得了更多观众的支持。13
Although Miss Debater occasionally flashed un-machine-like humor during the debate, she was likely incapable of cynicism, and perhaps because of the rather harsh political climate of conservative self-help at the time, Harish the human won over more of the audience to his side.13
辩论结束后,一位评论员指出,“虽然辩手小姐在引用有意义的事实方面明显比哈里什强,但哈里什的修辞技巧和反驳能力更强,并提出了辩手小姐处理不当的强硬反驳” 。14
After the debate, a commentator noted that “While Miss Debater was clearly better than Harish at citing meaningful facts, Harish had better rhetoric and better counters, and brought up tough rebuttals that Miss Debater did a poor job addressing”.14
辩论结束后,纳塔拉詹,一个来自东方却在西方辩论传统中长大的人,提出了未来人工智能发展中东西方融合的方案,15
After the debate, Natarajan, a man from the East but brought up in the West's debating tradition, proposed an amalgam of East and West for the future development of artificial intelligence,15
在习惯了第一分钟的震惊,即它不是人类,或者对这实际上意味着什么感到惊讶之后,它变得非常像人类……机器比任何人类都更擅长的是找到相关证据、研究、例子、案例,并了解背景。我认为另一件非常令人印象深刻的事情不仅是 [Miss Debater] 提出证据的能力,还有它解释为什么它在辩论中很重要的能力。将 Project Debater 的技能与人类的技能结合起来将会产生令人难以置信的强大力量!
After the first minute of getting used to the shock of it not being a human being, or the surprise of what that actually meant, it became much like a human being … what the machine is better at than any human could ever be is finding relevant evidence, studies, examples, cases, and get that context. Another thing I think is very impressive was not only [Miss Debater's] ability to present evidence, but also its ability to explain why it matters in the context of the debate. Combining Project Debater's skills with those of a human would be incredibly powerful!
人类仍然掌控一切,而机器作为助手,这是最佳组合吗?辩手小姐所展现的事实调查能力、逻辑论证能力、辩论能力和说服力,无论好坏,迟早可能会淘汰我们社会上所有的律师阶层……也许更令人痛心的是,还会淘汰人类和人道主义的教师和教授。
Is Man still in control and the Machine as helpmate the best combination? Miss Debater's demonstrated fact-finding prowess, logical argument formation, debating ability, and persuasiveness, for better or worse, sooner or later may very well obsolesce the entire class of lawyers in our society … and perhaps more distressingly, dispose of human and humane teachers and professors as well.
除了相当严肃的智力竞赛之外,由于视频游戏是在计算机上制作的,因此我们有理由相信,基于硅的机器(虽然没有拇指)也能学会玩视频游戏,并且能够玩得足够好,在这种快节奏、考验操作技巧、计划和智力的反射性竞赛中击败基于碳的人类玩家。
Apart from the rather serious contests of mental acuity, it seems that since video games were created on computers, it is logical to think that silicon-based machines (although having no thumbs) could also learn to play them, and play them well enough to defeat carbon-based human gamers in what is a decidedly fast-paced reflexive contest of manipulative skill, planning, and intelligence.
在动作类电子游戏中,游戏中所谓的“非玩家角色”(NPC)表现出看似智能的行为,包括突袭、躲藏、反击等,而代理玩家则考虑地形、障碍物、敌人、不利情况以及某些行动的成本(例如战争中需要获取的武器类型和弹药供应以及“生存本能”游戏),从一个点到另一个点进行所谓的最佳路径寻找。
In action video games, the so-called “Non-Playing Characters” (NPCs) in video games exhibit what appears to be intelligence in surprise attacking, hiding, counter-attacking, and so on, while the agent player is engaged in the so-called optimal pathfinding from one point to another taking into consideration terrain, obstacles, enemies, adverse situations, and costs of certain actions, for instance types of weapons to procure and ammunition supply in war and “survival instinct” games.
这款人工智能有限状态玩家与物理障碍和看似智能的NPC进行匹配,被视为一项突破,它就是多伦多/DeepMind 科技公司的卷积神经网络强化学习人工智能视频游戏玩家,它在对游戏规则没有任何先验知识的情况下,从零开始击败了经典雅达利游戏、 Beam Rider、Breakout、Enduro、Pong、Q*bert、Seaquest和Space Invaders等游戏的高度竞争环境中的人类专家级视频游戏玩家。
The artificial intelligence finite state player matched against physical barriers and apparently intelligent NPCs that was seen as the breakthrough was the Toronto/DeepMind Technologies’ convolutional neural network reinforcement learning AI Video Gamer, which with no a priori knowledge of even the rules of the game, from scratch defeated expert video game-playing humans in the highly competitive environment of the classic Atari games, Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, and Space Invaders.
多伦多大学的 Volodymyr Mnih 及其同事的目标是创建一个能够学习玩各种游戏的单一人工神经网络代理。该网络没有提供任何特定于游戏的信息或手工设计的视觉特征,并且不知道比赛中使用的 Atari 2600 模拟器的内部状态。
The University of Toronto's Volodymyr Mnih and colleagues’ goal was to create a single artificial neural network agent that is able to learn to play a variety of games. The network was not provided with any game-specific information or hand-engineered visual features, and was not privy to the internal state of the Atari 2600 emulator that was used in the competition.
多伦多/DeepMind 视频游戏玩家只从原始像素视频输入、奖励和终端信号中进行学习,并使用纯经验重放记忆生成一组可能的操作。无需针对不同游戏调整架构、学习算法或超参数,多伦多视频游戏玩家在所有七款游戏中的表现都与人类玩家一样,清楚地展示了强大的一般游戏能力。
The Toronto/DeepMind Video Gamer learned from nothing but raw pixel video input, the reward and terminal signals, and it generated a set of possible actions using purely experiential replay memory. There were no adjustments of the architecture, learning algorithm, or hyperparameters for different games, the Toronto Video Gamer performed just like a human player across all seven games, clearly demonstrating a robust general game-playing capability.
多伦多视频游戏玩家在七场比赛中三场击败了人类专家玩家(平局),基准测试表现优异。虽然概念上与跳棋相似,但视频游戏更依赖于快速的反射反应,而不是国际象棋和围棋相对较深的策略,但计算机最擅长的是有效的快速调查和反应,正如基本的文字处理所证明的那样。
The Toronto Video Gamer defeated expert human players in three out of seven of the games (with ties), with superior benchmark performance. Although conceptually similar to checkers, video games rely more on rapid reflexive responses rather than the relatively deeper strategies of chess and Go, but computers are best at effective quick survey and response, as demonstrated in basic word processing.
2011 年,IBM Watson 在广受欢迎的“给出答案,提出问题”电视游戏节目《危险边缘》中向两位历届冠军发起了挑战。WatsonQA 的硬件必须非常强大,因为为了确保与仅依靠即时记忆的人类进行公平比赛,WatsonQA不能访问任何外部信息。独立、强制风冷、十台冰箱大小的 Power 750 服务器存储了 2 亿页信息,供 3000 核处理器进行大规模并行处理,这些服务器位于隔音的候诊室的后台,可以消除过热处理器风扇的冷却噪音。
In 2011, IBM Watson Grand Challenged two all-time champions in the popular “given the Answer, pose the Question” television game show Jeopardy. The hardware had to be formidable because in order to ensure a fair match with humans depending only on their immediate memory, WatsonQA could not have any access to external information. The self-contained, forced air-cooled, ten refrigerator-sized Power 750 Servers that stored 200 million pages of information for 3000-core processor massively-parallel processing was backstage in a sound-proof holding room that muffled the sound of fans cooling the overheating processors.
由于Jeopardy是信息知识和检索的竞赛,而不是准确理解主持人讲话的能力,因此 WatsonQA 同时获得了文本形式的答案,它会根据关键字匹配、事实、语法、语言关系和响应风险管理(对错误答案进行惩罚,参赛者可以在比赛的后期阶段下注赢取奖金)来解析和并行处理树搜索线程。WatsonQA 可以根据单词组合在树叶节点上的确定性如何快速地从一个分支跳到另一个分支。
Because Jeopardy is a contest of information knowledge and retrieval, and not the ability to accurately understand the moderator's speech, WatsonQA was simultaneously given the Answers in text that it would parse and parallel process tree-search threads based on keyword matching, factoids, grammar, verbal relationships, and risk management of responses (there is a penalty for incorrect responses and contestants can wager winnings in the later stages of competition). WatsonQA could quickly jump from branch to branch of the tree depending on how the combination of words was playing out to definiteness at the tree leaf node.
自然语言开放域信息几乎总是不精确的、受上下文影响的并且常常具有歧义性,因此要在毫秒范围内识别冠军选手能够按下蜂鸣器以抢先于竞争对手,显然需要复杂的自然语言处理和搜索与逻辑处理速度。
Natural language open-domain information is almost always imprecise, subject to context and often ambiguous, so the recognition within the millisecond range that champion contestants can press the buzzer to pre-empt competitors obviously required sophisticated natural language processing and search and logic processing speed.
WatsonQA 使用 Wordnet、Wikipedia 和无数其他信息源,并针对一百名往届 Jeopardy 获胜者进行了训练,以搜索和权衡信息对正确答案的可能贡献,并对正确线索产生正偏向,对错误线索产生负偏向。
WatsonQA used Wordnet, Wikipedia, and myriad other information sources, and was trained against a hundred previous Jeopardy winners to search and weight information as to its probable contribution to correct responses, and positively bias correct and negatively bias incorrect threads.
WatsonQA 全面击败了 Jeopardy 的传奇冠军 Brad Rutter 和 Ken Jennings,赢得了百万美元的 Grand Challenge 奖金和 77,147 美元的游戏奖金,并且本可以继续赢得它参加的每一场比赛,但由于显而易见的原因,它被禁止参加 Jeopardy 的进一步比赛。在失败之后,Jennings 受到了惩罚卡斯帕罗夫非常缺乏这种镇定,而李世石只是承认了这一点,但有一个警告,16
WatsonQA roundly defeated legendary Jeopardy champions Brad Rutter and Ken Jennings, and won a million-dollar Grand Challenge prize and $77,147 in game winnings, and could have continued to win every contest it entered, but for obvious reasons was banned from further Jeopardy competition. After the defeat, a chastened Jennings displayed an equanimity that Kasparov sorely lacked and Lee Sedol only acknowledged with a caveat,16
正如 20 世纪工厂中的一些工作岗位被新型装配线机器人取代一样, 布拉德和我也是第一批被新一代“思维”机器取代的知识产业工人。“智力竞赛节目参赛者”可能是第一个被沃森取代的工作,但我确信这不会是最后一个。
Just as factory jobs were eliminated in the 20th Century by new assembly-line robots, Brad and I were the first knowledge-industry workers put out of work by the new generation of “thinking” machines. “Quiz show contestant” may be the first job made redundant by Watson, but I’m sure it won’t be the last.
到目前为止,机器已经在逻辑、速度和记忆力的较量中战胜了人类,但这些正是人类已经认为的计算机的强项;机器能在德州扑克这种典型的人类游戏(虚张声势、狡猾、诱饵、欺骗、虚张声势、恐吓和诡计)中击败人类吗?一场高风险的扑克游戏胜利将证明机器既可以粗俗,也可以精致。
So far the machine has bested humans in contests of logic, speed, and memory, but those are what humans already regard as the computer's strength; can a machine beat man in the quintessentially human game of bravado, cunning, baiting, deceit, bluff, intimidation, and subterfuge that is Texas Hold’em poker? A high-stakes poker game victory would demonstrate that a machine can be vulgar as well as refined.
2017 年,来自卡内基梅隆大学的一台面无表情的计算机在 120,000 手单挑(双人)、无限注(可下注所拥有的筹码总数)德州扑克比赛中,分别击败了四名排名前十的职业选手。在为期20 天的比赛结束时, Tuomas Sandholm 教授和研究生 Noam Brown 的Libratus以 1,700,000 美元的模拟筹码领先。
In 2017, a perfectly poker-faced computer from Carnegie-Mellon University soundly defeated four top-ten professional players each in 120,000 hands of heads-up (two-player), no-limit (can bet total of chips owned) Texas Hold’em, with Professor Tuomas Sandholm and graduate student Noam Brown's Libratus ahead by $1,700,000 in simulated chips at the end of the 20-day competition.
Libratus 采用了博弈论数学模型来描述理性决策者之间的战略互动,这看似与扑克牌的勇敢男子主义相悖,但实际上是为了冷静评估取得胜利所必需的获胜赌注概率。
Libratus employed game theory mathematical models of strategic interactions between rational decision-makers, something seemingly at odds with the dare-devil machismo of poker, but in truth directed at the calm assessment of the winning bet probabilities necessary for victory.
事实上,在虚张声势的背后隐藏着一种冷酷的逻辑,即利用纳什均衡博弈论和蒙特卡洛概率分布模拟,将抽签的运气与下注的技巧结合起来。17
Indeed, behind the façade of bravado lies a cold-faced logic that parlays the luck of the draw with the skill of the bet using Nash equilibrium game theory and Monte Carlo simulations of probability distributions.17
单挑无限注德州扑克一直是不完美信息情况下的基准挑战,在这种情形下,代理并不掌握所有信息(就像在棋盘游戏中,玩家只需看棋盘就能掌握所有信息),但仍必须做出关键决策才能取得成功。
Heads-up, no-limit Texas Hold’em has been a benchmark challenge for imperfect information situations where an agent does not have all the information (as one does in the board games by just looking at the board) but must still make critical decisions in order to be successful.
在德州扑克中,玩家不知道对手手中的牌是什么,也不知道未发牌的顺序,因此扑克就像大多数现实生活中的情况一样,必须在有限的信息下做出决定。不完全信息博弈论在比扑克更严肃的领域得到应用,例如股票市场、商业、地缘政治和战争。
In Texas Hold’em, a player does not know what cards the opponent is holding, nor the order of cards not dealt, and so poker is like most real-life situations where decisions must be made with limited information. Imperfect information game theory is employed in more serious undertakings than poker, for instance in the stock market, business, geopolitics, and warfare.
虽然风险更高,但考虑到过去人类在这些努力中的愚蠢行为,以及Libratus及其后代所展示的能力,也许我们应该把更多重大的决定留给机器而不是人类。
The stakes are much higher, but given past human folly in those endeavors, in light of the demonstrated capabilities of Libratus and its progeny, perhaps we should leave more momentous decisions to machines rather than humans.
胜利后,桑德霍姆教授被记者问到什么游戏可能超出计算机的能力,他回答说“人工智能已经超越了最优秀的人类,取得了超人的表现”。
After his victory, Professor Sandholm was asked by a reporter what game might be beyond a computer's capability, to which he replied that “AI had already surpassed the best human and achieved superhuman performance”.
那是一对一扑克。2019 年,诺姆·布朗 (Noam Brown) 的新多人扑克机器人Pluribus在六人 12 天的 10,000 手无限注德州扑克比赛中击败了 15 名顶级职业玩家,每 20 秒打一手,比最优秀的职业扑克玩家快两倍多。
That was one-on-one poker. In 2019 Noam Brown's new multiplayer pokerbot Pluribus, in a six-player 12-day session of 10,000 hands of no-limit Texas Hold’em defeated 15 top professional players playing a hand every 20 seconds, more than two times faster than the best professional poker players.
比赛最终得分:机器 21 分,人类 1 分,人类获胜的一方是由人类观众决定的点数变化的结果,这可能暗中暴露了人类对机器的主观偏见,正如观众在深蓝与卡斯帕罗夫国际象棋比赛中公开展示的那样。事实上,哈里什·纳塔拉詹在辩论结束后说:“我觉得我有优势,因为我不是机器,我是人类。”这种情感优势可能在许多观察者眼中仍然存在,但随着机器领域的扩大,机器的功能优势将变得更加明显。
Final score in games: Machine 21, Humans 1, with the one victory by a human the result of a number change of position decided by a human audience, likely covertly betraying a subjective prejudice for the human against the machine, as overtly displayed by the audience at the Deep Blue-Kasparov chess match. Indeed, Harish Natarajan said after the debate that “I felt I had an advantage because I was not a machine, I was a human”. That emotional advantage may well persist in the eyes of many observers, but the machine's functional advantages would become clearer as its domain broadened.
乌到目前为止,人工智能已经被用于玩游戏,如果只是为了证明机器可以思考,那么机器已经在特定领域的努力中超越了人类。有了这种能力,除了像Jeopardy这样的琐碎智力竞赛游戏,例如 WatsonQA 也可以提供更多有用的功能,例如,利用其从答案中了解问题的超强能力(这是诊断吗?),协助医生治疗和减轻痛苦,或者机器人医生甚至接管患者的整个医疗工作,同时为机器的所有者创造新的商机。
Up to this point, artificial intelligence has been used for playing games, if the idea is just to prove that machines can think, the machine has already surpassed humans in restricted domain endeavors. With that capability, aside from trivial quiz show games like Jeopardy, for instance WatsonQA could just as well provide more useful functions, for example given its supreme ability to know the question from the answer (is that diagnosis?), assisting physicians in the treatment and alleviation of suffering or robot physicians even taking over a patient's entire medical treatment, while at the same time creating new business opportunities for the machine's proprietors.
在类似《危险边缘》的医疗诊断游戏《医生的困境》中,曾经有一道答案:
In the Jeopardy-like medical diagnosis game Doctor's Dilemma, there was once an answer:
该综合征的特征是关节痛、腹痛、可触及的紫癜和肾炎沉积物
The syndrome characterized by joint pain, abdominal pain, palpable purpura, and a nephritic sediment
问题(当然)是“什么是Hanoch-Schonlein Purpura?”Jeopardy 中使用的相同游戏格式可由 WatsonQA 用于严肃且无疑非平凡的医学诊断中,这需要在诊断时了解问题。
The question is (of course) “What is Hanoch-Schonlein Purpura?” The same game format used in Jeopardy can be used by WatsonQA in serious and indubitably non-trivial medical diagnostics that requires knowing the question when you have the diagnosis.
除了 Jeopardy 机器的板载内存外,还可以连接IBM Watson Health医生或医师助理机器互联网可以立即访问许多不同的医疗信息来源和最新研究进展。然后快速分析、诊断并提出治疗建议,所有这些能力是当地全科医生难以比拟的。
In addition to the on-board memory of the Jeopardy machine, an IBM Watson Health doctor or physician's assistant machine could be connected to the Internet and thereby immediately access many different medical information sources and the latest developments in research. Then quickly analyzing, diagnosing, and advising treatment, all with a competence that a local general practitioner would be hard pressed to match.
IBM Watson Health 拥有肿瘤学、基因组解释和糖尿病管理等项目。它采用Nuance专为医学术语设计的语音识别软件作为患者界面,因此在响应查询时,可以在几秒钟内搜索来自云端的医疗信息,问题和答案在云端处理,允许任何具有经过验证的连接的人访问大量医疗信息和专家诊断。
IBM Watson Health has programs for example in oncology, genomic interpretation, and diabetes management. It employs Nuance's speech recognition software specifically designed for medical terminology to act as a patient interface, so in response to a query, medical information from the cloud can be searched in seconds, and questions and answers are processed in the cloud, allowing access to voluminous medical information and expert diagnosis for anyone with a verified connection.
例如,医生可能会对 Watson Health 说:“我的病人有消化问题,对她最喜欢的消遣——保龄球失去了兴趣。” IBM 的Blue Gene超级计算机随后会在《精神疾病诊断和统计手册》(DSM)中搜索“失去兴趣”,并将其归类为抑郁症的症状,然后扫描期刊,寻找“抑郁症”和“消化问题”的逻辑 AND,并找到一篇关于乳糜泻(一种自身免疫性疾病)的文章。如果有其他文章支持这一诊断,且没有发现明显矛盾的证据,医生可以安排实验室测试来确认或排除乳糜泻的诊断。如果得到确认,医生会建议患者进行无麸质饮食,希望患者很快就能愉快地回到保龄球馆。
For instance, a physician might say to Watson Health, “My patient has had digestive issues and has lost interest in bowling, her favorite pastime.” IBM's Blue Gene supercomputer then searches the Diagnostic and Statistical Manual of Mental Disorders (DSM) for “lost interest” and classifies that as a symptom of depression, and then scans journals looking for the logical AND of “depression” and “digestive problems” and finds an article on celiac disease, an autoimmune disorder. If there are other articles supporting this diagnosis and no clearly contradictory evidence found, the physician can order lab tests to confirm or dispel the celiac diagnosis. If confirmed, a gluten-free diet would be advised, and the patient hopefully will soon be happily back at the bowling alley.
为了改进自动诊断,Watson Health 可以为文章分配权重,例如基于正面引用的数量,并考虑增加特定区域内特定诊断概率的流行病学因素等。
To refine the automated diagnoses, Watson Health could assign weights to articles, for example based on the number of positive citations, and consider for example epidemiological factors that increase the probability of a given diagnosis in a given region.
80% 的医疗保健数据都是非结构化的,就像Jeopardy 的WatsonQA 一样,Watson Health 可以通过自然语言处理读取和理解非结构化数据,从而识别、分类和编码来自几乎任何来源的临床信息,就像当今其他领域一样,大数据可以通过海量数据大幅改善医疗预测分析。
Eighty percent of healthcare data is unstructured, and just like Jeopardy's WatsonQA, Watson Health can read and understand unstructured data by natural language processing to identify, classify, and encode clinical information from virtually any source, and just as in every other field today, Big Data can substantially improve medical predictive analytics with voluminous data.
Watson Health 使用基于大量医疗数据训练的人工神经网络对诊断进行分类,并通过强化学习提高其诊断准确性。同样,人类无法忍受全天候的训练和不知疲倦的学习,事实上,许多受控实验表明,Watson Health 的机器学习可以与人类医生的诊断相媲美,甚至优于人类医生。
An artificial neural network trained on massive amounts of medical data is used by Watson Health to classify diagnoses, with reinforcement learning improving its diagnostic accuracy. Again, a human cannot endure 24/7 training and tireless study, and indeed many controlled experiments have shown that Watson Health's machine learning can be equal to if not superior to diagnoses by human physicians.
一旦启动并运行,Watson Health 可以执行问题识别并自动根据患者的医疗记录生成护理摘要,然后在对具有临床相似性的患者进行分类后,可以创建动态患者队列以针对特定患者群体进行路径选择,其中最佳护理路径成为可供所有从业者使用的医疗大数据不可或缺的一部分。
Once up and running, Watson Health can perform problem identification and automatically produce a summary of care from a patient's medical record, then after classification of patients with clinical similarity, dynamic patient cohorts can be created for path selection for a given group of patients, with the optimum care paths becoming an integral part of healthcare Big Data available to all practitioners.
对于医学研究,Watson Health 可以在医学文献中查找信息来支持新的假设并创建新的诊断工具;例如,快速扫描和阅读一整套医学文献(如Medline期刊) ,并从中找出与所研究主题在语义上相关的文档。1
For medical research, Watson Health can find information in the medical literature to support new hypotheses and create new diagnostic tools; for example, quickly scanning and reading a complete set of medical literature such as the journal Medline, and from there identify documents that are semantically related to the research topic in question.1
尽管如此,许多执业医师反对使用人工智能进行诊断或其他医疗事务,通常指出机器对患者缺乏同理心(以及对被超越的医生缺乏同情),并且机器误诊可能会增加医疗事故的责任风险。
Notwithstanding, many practicing physicians oppose the use of artificial intelligence for diagnosis or other medical matters, often citing a machine's lack of empathy for the patient (as well as sympathy for the one-upped doctor), and the likely increased liability risk of malpractice stemming from machine misdiagnosis.
此外,让机器接管医疗诊断违背了人类医生的职业利益,而且一些医生的傲慢可能会阻止他们接受机器诊断,无论它带来的好处有多大。
Furthermore, having a machine take over medical diagnosis is against a human physician's professional self-interest, and the storied arrogance of some physicians may prevent their acceptance of machine diagnosis, no matter how great the benefits.
或许,辩手哈里什·纳塔拉詹 (Harish Natarajan) 提出的人机协作将鼓励机器完成艰苦的诊断工作,由人类医生决定治疗方案并负责患者的情感护理,同时增加机器医疗事故保险,并且与人类医生相比,由于机器人进行研究和诊断的成本更低,因此可以从中获得更高的利润,从而支付保费。
Perhaps the debater Harish Natarajan's suggestion of man/machine collaboration would encourage a machine doing the hard work of diagnosis with the human physician deciding on treatment and handling the emotional care of the patient, with augmented machine malpractice insurance, and premiums paid from the higher profits derived from the lower costs of research- and diagnosis-performing robots compared with human physicians.
这与下面希波克拉底誓言的第一句相一致,但是,当想到机器人医生将接管患者诊断时,医生的不安和/或经济担忧是否应该超过下面第二句中规定的责任?2
This is in accord with the first sentence of the Hippocratic oath below, but should the physician's unease and/or pecuniary concerns at the thought of a robot physician taking over patient diagnosis outweigh the duty set forth in the second sentence below?2
我会记住,医学是一门科学,也是一门艺术,温暖、同情和理解可能比外科医生的手术刀或药剂师的药物更有价值。
I will remember that there is art to medicine as well as science, and that warmth, sympathy, and understanding may outweigh the surgeon's knife or the chemist's drug.
我不会羞于说“我不知道”,当病人的康复需要别人的技能时,我也不会不请我的同事来。
I will not be ashamed to say “I know not”, nor will I fail to call in my colleagues when the skills of another are needed for a patient's recovery.
当“同事”是医生的助理机器或非常有能力的医生机器人时,人类医生不愿意接受人工智能来帮助照顾病人,这是否违反了希波克拉底誓言?
When the “colleagues” are physician's assistant machines or very capable Doctor Robots, are human physicians violating their Hippocratic oath when they are not receptive to artificial intelligence to help care for a patient?
纽约大学医学院利用从Cancer Genome Atlas下载的肿瘤图像数据开发了一种从患病和正常肿瘤图像中自动识别和诊断肺癌的人工智能,该肿瘤图像数据由专家病理学家对肿瘤和诊断的详细显微镜检查准备而成。该数据构成了来自 1200 例患病和健康肺部病例的 800,000 张图像的训练集,用于机器学习。Google Inception v3计算机视觉卷积神经网络 (CNN) 从图像数据识别中学习诊断,经过两周的训练,CNN 对肿瘤的诊断准确率达到 97%,优于作为对照组的三位专家病理学家。
An artificial intelligence automatic recognition and diagnosis of lung cancer from images of diseased and normal tumors was developed by the NYU School of Medicine with tumor image data downloaded from the Cancer Genome Atlas, which was prepared by expert pathologists’ detailed microscopic examinations of tumors and their diagnoses. This data constituted a training set of 800,000 images from 1200 cases of diseased and healthy lungs for machine learning. The Google Inception v3 computer vision convolutional neural network (CNN) learned diagnosis from the image data recognition, and after two weeks of training, the CNN could correctly diagnose tumors at 97% accuracy, better than the three expert pathologists who served as a control group.
更进一步,纽约大学的 CNN 被要求从训练集图像中提取的不仅仅是癌症诊断。肿瘤专家无法仅从肿瘤图像中辨别基因突变,而是必须读取肿瘤的 DNA 测序并将其与患者的正常 DNA 进行比较才能检测出基因突变,这是一个繁琐且容易出错的过程。
Taking a step further, NYU's CNN was asked to extract more than just the cancer diagnosis from the training set images. Expert oncologists cannot discern genetic mutations solely from images of tumors, but rather must read the tumor's DNA sequencing and compare it with the normal DNA of the patient to detect genetic mutations, a tedious and error-prone process.
纽约大学癌症肿瘤自动诊断工具能够自动预测肺癌关键驱动基因的突变状态,准确率超过 80%,并且发现更多的训练集数据可以进一步提高该准确率,远远超过人类对基因突变的最佳检测。3
The NYU automatic cancer tumor diagnostic tool was able to automatically predict the mutational status of a key lung cancer-driving gene with greater than 80% accuracy, and it was found that more training set data could further increase that accuracy rate to far surpass the best human detections of genetic mutations.3
人们曾寄希望于人工智能能够帮助战胜 2019 年末席卷全球社会和经济的变革性冠状病毒疫情。许多人转向人工智能,希望人工智能能够找到预测疫情爆发和攻击病毒的方法。
There was great hope that artificial intelligence could help to overcome the transformative coronavirus pandemic of late 2019 that decimated societies and economies worldwide. Many turned to artificial intelligence in the hope that AI could find ways to predict the outbreaks and attack the virus.
由于这是一种新型病毒,因此没有足够的数据来模拟冠状病毒大流行的传播。因此,例如,卡尔曼滤波器(用于使用行程数据预测自动驾驶汽车在旅途中的位置)会测量当前运动状态向量并估计其不确定性,然后在收集来自多个来源的更多信息时更新和加权数据。卡尔曼滤波器需要类似的数据来预测病毒传播的位置。4
Since this was a novel virus, there was insufficient data to model the coronavirus pandemic spread. Thus for example, a Kalman filter, as used in predicting the position of a self-driving automobile on a journey using trip data, measures the current motion state vectors and estimates their uncertainties, then updates and weights the data as more information from multiple sources is collated. The Kalman filter would require similar data to predict the locations of virus spread.4
这些数据是由一家名为 BlueDot 的加拿大公司收集的,用于预测新冠病毒的传播情况。该公司不断收集与疾病相关的在线新闻、官方报告、社交媒体提及和空中交通数据,然后将这些数据与美国国立卫生研究院和全球微生物数据库进行交叉引用。自然语言处理 (NLP) 算法通过对焦点词(例如“发烧”)的解释来关联数据,从而影响对其他词的解释,从而帮助确定新冠病毒的分布情况。这使得 BlueDot 能够准确预测 2020 年 3 月 30 日的 127,000 例病例以及中国、意大利、伊朗和美国的疫情。然而,新出现的变异体更难建模。5
This data was gleaned for covid-19 spread prediction by among others a Canadian company called BlueDot that continuously collected online disease-related news, official reports, social media mentions, and air traffic data, and then cross-referenced the data with the National Institutes of Health and Global Microbial databases. A natural language processing (NLP) algorithm correlated the data through the interpretation of a focal word (for instance “fever”) that influences the interpretation of other words, thereby helping to identify covid-19's distribution. This allowed BlueDot to predict 127,000 cases for March 30, 2020, and outbreaks in China, Italy, Iran, and the United States that were spot-on., however the new mutation variants appearing have been more difficult to model.5
已知可感染人类的病毒有 200 多种,每种病毒的感染机制、行为以及对治疗和疫苗的反应都不同。严重急性呼吸综合征冠状病毒 -2病毒主要通过口腔或鼻腔进入人体时,会通过冠状病毒表面的蛋白质刺突与人体细胞表面的细胞受体结合,从而侵入健康细胞。被病毒侵入后,细胞会复制其 RNA 以及组装新病毒颗粒所需的结构蛋白,然后释放到体内,引起感染。
There are more than 200 viruses known to infect human beings, each with different infection mechanisms, behavior, and response to treatments and vaccines. When the severe acute respiratory syndrome coronavirus -2 virus enters the body, mostly through the mouth or nose, it infiltrates healthy cells by binding to the cell receptors on the surface of human cell by means of the protein spikes studded on the coronavirus surface. Infiltrated by the virus, that cell then replicates its RNA, as well as the structural proteins needed to assemble new viral particles, which are then released into the body causing an infection.
由于目前尚无针对新冠病毒的已知治疗方法,因此有必要研发疫苗来控制疫情。世界各地的大学和研究机构至少研发了八种不同类型的疫苗,包括灭活病毒、DNA 和 RNA 疫苗。
Since there is no known cure for the new covid-19 virus, it was thus necessary to find a vaccine to control the pandemic. Universities and research institutions all over the world pursued at least eight different types, including inactivated viruses and DNA and RNA vaccines.
方法是找到能够识别病毒部分并与之结合的蛋白质抗体,从而刺激免疫反应。然而,病毒的潜在目标有成千上万种。DeepMind 的使用AlphaFold神经网络根据冠状病毒的基因序列预测其三维形状;然后利用机器学习,根据已知病原体的训练集数据,预测病毒的哪些部分可以作为目标。
The approach was to find protein antibodies that can recognize parts of the virus that it can bind to, and thus stimulate an immune response. There are, however, tens of thousands of possible virus targets. DeepMind's AlphaFold neural network was used to predict the three-dimensional shape of the coronavirus based on its genetic sequence; then machine learning was employed to predict which parts of the virus can be recognized as targets based on training set data of known pathogens.
合理的是,病毒表面的刺突蛋白是使病毒无法与人体细胞结合的最佳靶标。靶向蛋白(传统上是灭活病毒)随后被整合到候选疫苗中并测试免疫反应。
It was found, reasonably enough, that the spike proteins arrayed on the surface of the virus were the best targets for rendering the virus incapable of binding to the human cells. The targeting proteins, conventionally inactivated viruses, are then integrated into vaccine candidates and tested for immune response.
然而,DNA和RNA基因组可以模仿病毒的一部分基因序列,促使细胞产生引发免疫反应的抗原,而不是灭活的病毒。
Instead of inactivated viruses, however, DNA and RNA genomes can mimic a part of the virus’ genetic sequence to prompt the cells to produce the antigen that triggers an immune response.
由于病毒蛋白具有 3D 结构,模仿病毒蛋白需要一种复杂的化学测序技术(称为蛋白质折叠),该技术由 DeepMind 和 Moderna 等公司开发,并且人工智能已用于设计和合成 DNA 疫苗的遗传成分,迄今为止,这些疫苗已取得很大成功。6
Because of its 3D structure, mimicking viral proteins requires the complex chemical sequencing technique called protein folding, as developed by DeepMind and Moderna, among many others, and artificial intelligence was used to design and synthesize the genetic components of DNA-based vaccines that to date have been largely successful.6
几乎每个主要的汽车制造商、叫车服务和信息行业科技巨头都在开发自动驾驶汽车。为了对这个新兴行业进行分类标准化,美国汽车工程师学会 (SAE) 为自动驾驶汽车行业建立了五个发展阶段:
Practically every major automobile manufacturer, ride-hailing service, and information industry tech giant is in the process of developing self-driving cars. In an attempt at classification standardization of a burgeoning new industry, the Society of Automotive Engineers (SAE) has established five levels of progression for the automated driving automobile industry:
正在进行道路测试的早期自动驾驶汽车可以通过车顶上的激光雷达 (LIDAR) 塔轻松识别,该雷达塔发射低功率激光束来绘制车辆周围环境图,而不会使路人眼花缭乱。
Early autonomous cars undergoing road testing can be easily identified by the stark light detection and ranging (LIDAR) tower on the car roof sweeping out low-powered laser beams to map the vehicle's surroundings without blinding passersby.
激光束遇到物体反射后,光电管接收返回的光束,并将光强度转换成电流,根据返回光信号的时间跨度,测量出被扫描物体的距离,根据反射时波长的变化,利用多普勒效应计算出被扫描物体与汽车的相对运动。7
The laser beam is reflected by objects and photoelectric cells pick up the return beams and convert the light intensity to electric current, and from the time span of the returned light signals, the distance of scanned objects is measured, and in accord with the change in wavelength upon reflection, the relative motion of scanned objects and the car can be computed using the Doppler effect.7
贝叶斯同步定位与地图构建(SLAM) 软件融合了激光雷达、雷达、声纳、里程计、GPS、惯性测量单元 (IMU)、计算机视觉和导航系统的信息,构建了周围环境的点云地图,同时跟踪汽车在该环境中的位置。根据 SLAM 输出,汽车上的执行器和伺服器控制汽车的方向、速度和制动,以实现自动驾驶。物体识别是通过计算机视觉模式识别实现的,采用深度人工神经网络,可以从实际驾驶员经验的训练集中学习。
Fusing information from LIDAR, radar, sonar, odometry, GPS, inertial measurement units (IMUs), computer vision, and navigation systems, Bayesian simultaneous localization and mapping (SLAM) software constructs a point cloud map of the surroundings while keeping track of the car's position within that environment. Following the SLAM output, actuators and servos on the car control the direction, speed, and braking of the car for autonomous driving. Object recognition is by computer vision pattern recognition employing deep artificial neural networks that can learn from training sets of actual driver experience.
当然,训练数据越多,自动驾驶汽车的响应就越准确;也就是说,自动驾驶汽车会像人类一样学习驾驶,通过驾驶更多车辆来获得技能。如果将整个自动驾驶车队的驾驶数据加载到计算机中,并作为车队中所有自动驾驶汽车的训练集,那么获得的共享驾驶经验将远远超过任何一个人在一生的驾驶过程中所能积累的经验。
Of course the more training data, the more apposite is the response of the autonomous car; that is, the autonomous car learns to drive like a human drives, gaining skill by driving more. If the driving data of a full fleet of autonomous vehicles are loaded into a computer and serves as training sets for all the autonomous cars in the fleet, the shared driving experience gained will far surpass what any one human can accumulate over a lifetime of driving.
研究表明,超过 90% 的汽车事故都是人为失误造成的,因此,如果道路上的所有汽车都是自动驾驶的,那么丰富的驾驶经验以及对交通、障碍物的适当反应,和行人的安全将大大提高整体交通安全。至少,无人驾驶汽车将消除醉酒驾驶、鲁莽驾驶和路怒症。
Studies have shown that over 90% of automobile accidents are the result of human error, so if all the cars on the road were autonomous, the wealth of driving experience and proper responses to traffic, obstacles, and pedestrians would significantly increase overall traffic safety. At the very least, the driverless car will do away with drunken and reckless driving, and road rage.
自动驾驶汽车普及的一个主要障碍是,人们最初在潜在危险的环境中,心理上不愿意将控制权交给机器,但研究表明,在上车和启动时表现出明显的疑虑之后,仅仅大约十分钟,在没有发生任何意外事件之后,即使是控制欲极强的司机也会很快放心地让汽车来驾驶。
A major obstacle to the more general use of self-driving cars has been the initial psychological disinclination of turning control over to a machine in a potentially dangerous environment, but research has shown that after the pronounced misgiving upon getting into the car and starting up, in only about ten-minutes after nothing untoward has happened, even control-freak drivers are quickly at ease letting the car do the driving.
此外,还有一些技术可以消除疑虑,例如,当汽车安全行驶时,柔和的蓝色面板照明或情绪音乐,只有当出现问题时,灯光才会变为黄色并播放更快的音乐,而在需要时,红灯和刺耳的蜂鸣声会提醒驾驶员进行干预。
Furthermore, there is technology to dispel misgivings, for example, soft blue panel lighting or mood music as the car is safely moving along, with lights turning to yellow and faster music only when something is remiss, and red lights and blaring beeps calling for driver intervention when needed.
在液晶显示器 (LCD) 的复杂批量生产中,尽管几乎完全实现了自动化生产,但 LCD 面板的最终检查仍是通过目视进行的,如果发现面板存在缺陷,则可能导致整个生产批次降级或直接报废。此外,缺陷的原因以及产生的时间和方式通常很难确定,因此,原因不明的缺陷被遗憾地称为“ mura”,这是日语中“不规则”或“不均匀”的通用术语,汽车制造商用这个词来表示“浪费”。
In the complex mass manufacture of liquid crystal displays (LCDs), in spite of almost completely automated production, final inspections of LCD panels were done visually and if a defect in a panel is found, it can cause an entire production run to be downgraded or just discarded. Moreover, the cause of a defect and when and how it was generated is often difficult to ascertain, such that the term for a defect whose cause is unknown has been ruefully called mura, a generic Japanese term for “irregular” or “non-uniform” and a word used by car manufacturers as “wasted”.
中国星光是全球第三大电视机生产商TCL的子公司,它与IBM Watson合作开发了一种人工智能液晶面板检测系统,该系统使用计算机视觉卷积神经网络取代人工目视检查缺陷,通过与缺陷图像及其原因数据库进行模式识别比较,自动检测、识别和分类缺陷。
China Star, a subsidiary of TCL, the world's third largest producer of television sets, engaged IBM Watson to develop an Artificial Intelligence LCD Panel Inspection System that obviates human visual inspection of defects using computer vision convolutional neural networks to automatically detect, identify, and classify defects from pattern recognition comparison with a database of defect images and their causes.
几乎完全自动化的 LCD 制造装配线在每个工位上都有机器人,除了最终的检查扫描之外,在机器人头部安装的 CMOS 传感器还可以扫描关键制造工位的面板是否存在缺陷,如果发现缺陷,生产线可以在该阶段停止,并根据需要调整流程以防止出现缺陷,然后继续生产,从而节省时间并避免完全没有效率的生产运行。
The almost completely automated LCD fabrication assembly line has robots at virtually every station, and in addition to a final inspection scan, mounting CMOS sensors on the robots’ heads can scan the panels at critical fabrication stations for defects, and if found, the production line can be stopped at that stage and the process adjusted as needed to prevent the defect, and then continued, saving time and avoiding a completely unproductive production run.
经过训练的人工智能检测系统还可以实时将新的缺陷数据存储在数据库中,从而在装配线上工作时提高其缺陷检测技能,从而提高产量。8
The trained AI inspection system can also store new defect data in the database in real time, and thus improve its defect detection skills as it works on the assembly line, and so increase yield.8
有许多由程序员自己组织的在线程序编写竞赛,吸引了代码编写机器的开发人员根据输入输出测试中的编程目标来生成人类可读的源代码。
There are many online program-drafting competitions organized by programmers themselves that have attracted developers of code-writing machines to generate human-readable source code given a programming objective in an input-output test.
自动程序生成采用循环神经网络(RNN),其中一些隐藏层会响应后续层的激活而发生变化,可以对顺序编程步骤进行建模,以制定满足输出测试目标的工作程序。
Automatic programming generation employing recurrent neural networks (RNNs), wherein some hidden layers change in response to activation from succeeding layers, can model sequential programming steps to formulate a working program that satisfies the output test objective.
微软和剑桥的研究人员开发了一种名为DeepCoder 的学习归纳程序合成(LIPS) 机器,它利用文本识别确定特定任务的编程语言属性,从而生成C++和 Python 高级计算机语言中使用的逐字符序列的编程数据集。LIPS 使用人工神经网络学习给定编程任务的属性概率分布,然后在从任务目标得出的机器学习输入输出映射的指导下,在现有计算机程序中搜索与输入输出测试目标一致的程序步骤。9
Researchers at Microsoft and Cambridge have developed a Learning Inductive Program Synthesis (LIPS) machine called DeepCoder that determines the attributes of programming language for a specific task utilizing text recognition to generate a programming dataset of the character-by-character sequences used in the C++ and Python high-level computer languages. LIPS learns the probability distributions of attributes for the given programming task employing an artificial neural network, and then guided by the machine-learnt input-output mapping derived from the task objectives, searches existing computer programs for program steps consistent with the input-output test objectives.9
2020 年,埃隆·马斯克创立了OpenAI,并发布了生成式预训练 Transformer 第 3 版(GPT-3),它不仅可以编写计算机程序,还可以通过从互联网上爬取的庞大数据库和数十亿个参数进行训练。它的算法通过强化和无监督学习,可以创作散文和诗歌,甚至可以编写任何多达 50,000 字的文本。
In 2020, the Elon Musk founded OpenAI and announced its Generative Pre-trained Transformer version 3 (GPT-3) that could not only write computer programs, but through enormous database training from crawling the Internet and billions of parameters. its algorithms, through reinforcement and unsupervised learning, could compose prose and poetry, and indeed write any text up to 50,000 words.
这些可靠、不知疲倦、从不抱怨的机器人程序员特别适合在 Red Bull® 的驱动下通宵进行密集编程,以及准确无误地使用C++中的所有分号和括号,它只需要电力来维持生存,而不需要免费的可乐®、La Croix®、坚果、披萨、炸馄饨和乒乓球桌等基础设施。
The dependable, tireless, non-complaining robot programmers are particularly suited to the Red Bull®-driven all-night sessions of intensive programming, and the unerring placement all the semicolons and parentheses in C++, and it will need only electricity for sustenance, with no infrastructure requirements of free Coke®, La Croix®, nuts, pizza, fried won tons, and ping-pong tables.
目前,新程序应用的想法属于人类智慧的范畴,但随着 GPT-3 及其后代获得编程技能,它们似乎将开发出全新的编程技术,并为计算机编程找到新的用途和全新的领域。10
The ideas for new program applications at present are the province of human ingenuity, but it looks like that as GPT-3 and its progeny gain programming skill, they will develop entirely new programming techniques and find new uses and entirely new areas for computer programming.10
独立数字助理可以处理个人和商业通信,响应人类语音命令以合成语音提供信息。其移动衍生品——服务机器人可以成为助手和伴侣。
The stand-alone digital assistant can handle personal and business communications, providing information in synthetic speech in response to human speech commands. Its mobile derivative, a service robot can be a helpmate and companion.
每个人都熟悉当今的个人电脑、智能手机、数字助理和服务机器人能够理解和响应语音命令,而且它们确实做得相当不错。从美国国防高级研究计划局(DARPA) 资助的持续语音识别和自然语言前端识别器开发,到 IBM 开创性但存在缺陷的Newton和 Apple 早期用词不当的Siri的笨拙冒险,自然语言处理(NLP) 和意图驱动界面(如 Nuance、Apple 的 Intelligent Siri、Wolfram Alpha、IBM Watson、Google Assistant、Now、Nest、Microsoft Cortana 以及 Amazon Echo 和 Alexa)已经取得了巨大进步,所有这些都在探索语音模糊性和推理的“已知未知数”。
Everyone is familiar with the speech commands understood and responded to by today's personal computers, smartphones, digital assistants, and service robots, which actually do a fairly good job. From America's Defense Advanced Research Projects Agency (DARPA) sponsored continuous speech recognition development and natural language front-end recognizers, to the clumsy adventures of IBM's pioneering but flawed Newton and Apple's early malapropistic Siri, great strides have been made in natural language processing (NLP), and intention-driven interfaces such as Nuance, Apple's Intelligent Siri, Wolfram Alpha, IBM Watson, Google Assistant, Now, Nest, Microsoft Cortana, and Amazon Echo and Alexa, all of which are probing the “known unknowns” of ambiguity and inference in speech.
这些机器人已经存在于我们之中,它们要么帮助我们,要么让我们感到困惑;它们最终会取代商业、政府、家庭等所有服务类别吗?尽管它们可以适当地响应人类的语音命令,但它们是否真的能够“理解”人类的需求并能够像有学问的人类一样传递信息?
These robots are already among us, either helping or bewildering us; will they eventually replace all of the service class for business, government, home, and although they can appropriately respond to human voice commands, will they ever really “understand” human needs and be able to deliver information like learned humans?
寻找外星生命是人类最迫切的追求之一。目前已发现四千多颗围绕相对较近恒星运行的行星,其中一半以上是由2009 年发射的绕地球运行的太阳中心轨道开普勒太空望远镜(KST) 发现的。
The search for extraterrestrial life is one of humankind's most compelling pursuits. Presently over four thousand planets orbiting relatively near stars have been discovered, more than half of which were made by the Earth-trailing heliocentric orbiting Kepler Space Telescope (KST) launched in 2009.
KST 寻找行星在恒星前方移动时引起的恒星亮度的微小周期性变化。然而,2012 年和 2013 年,开普勒的四个稳定定向反作用轮中有两个发生故障,太空望远镜无法保持稳定的指向位置,导致观测数据精度较低且噪声很大。
KST looks for the tiny periodic changes in stellar brightness caused by exoplanets moving in front of stars. However, in 2012 and then in 2013, two of Kepler's four stabilizing directional reaction wheels failed, and the space telescope could not hold a stable pointing position, resulting in less precise and very noisy observational data.
然而只要 KST 仍在观察,就应该有系外行星以与以前相同的频率穿过它的视野,因此德克萨斯大学的 Anne Dattilo 设计了AstroNet-K2,这是一个深度神经网络,在对已知系外行星进行训练后,可以系统地消除 KST 信号中的不稳定性与噪音,不仅能发现新的系外行星,还能在旧的观测数据中找到甚至经验丰富的系外行星天文学家都错过的系外行星。
However as long as KST was still looking, there should be exoplanets passing through its field of view at the same rate as before, so Anne Dattilo at the University of Texas designed AstroNet-K2, a deep neural network that after training on known exoplanets, could systematically remove the instability and noise from KST's signals, and not only reveal new exoplanets, but also find exoplanets in the old observational data that even experienced exoplanet astronomers had missed.
此外,从 2016 年 12 月到 2017 年 3 月,当火星穿过受损的 KST 视野时,它的直射光和散射光掩盖了任何系外行星的信号,但 AstroNet-K2 英勇地克服了所有不稳定、噪音和红色星球的反射眩光,发现了两颗系外行星。
Furthermore, from December 2016 to March 2017, as Mars passed through the crippled KST's field of view, its direct and scattered light obfuscated any exoplanet signatures, but AstroNet-K2 heroically discovered two exoplanets through all the instability, noise, and the reflective glare of the Red Planet.
一颗被发现的系外行星是一颗超地球大小、被挥发性物质包裹的“蓬松”行星,它围绕一颗类似太阳的恒星旋转,周期为 13 天,表面温度为 750……°C,对于人类来说有点太热,但快节奏、喜爱炎热的生物会喜欢超热带气候中快速过去的季节。
One detected exoplanet was a super Earth-sized, volatile-enveloped “puffy” planet whipping around a Sun-like star with a 13-day period and a surface temperature of 750……°C, a little too hot for humans, but fast-paced, heat-loving beings would love the quickly-passing seasons of super-tropical weather.
第二颗系外行星也比地球大很多,但其周期更短,只有 3 天,这确实让人觉得“时间过得像几分钟”,表面温度高达 1400……°C,足以熔化铝,更不用说人类了。人们不禁想知道这些系外行星上的生物会是什么样子…… 11
The second exoplanet was also super Earth-sized, but with an even shorter 3-day period that would truly make “the hours pass like minutes”, and a surface temperature of 1400……°C, hot enough to melt aluminum let alone humans. One wonders what the beings on these exoplanets would look like ….11
2015 年,第一颗地球大小的系外行星也是由受损的开普勒太空望远镜发现的。这颗行星的名字很平淡,名为Kepler-45b,绕着Kepler 45恒星运行,位于宜居带内。Kepler 45 是一颗大小与太阳相似的恒星,公转周期为 385 天,几乎与地球一年完全相同。但由于 Kepler-45b 的恒星比太阳大约老 15 亿年,而且亮度要高得多,因此 Kepler-45b 的温度略高于地球,但如果大气和气压合适,它就适合形成 H 2 O 三相点,因此具有类似人类居住的潜力。然而,它的温度大约是地球的两倍由于它与地球一样大,引力更强,所以任何地球 2.0动物都不需要太多的毛皮,而且它的人形生物会皮肤黝黑、矮胖、肌肉发达。
The first Earth-sized exoplanet was discovered in 2015 also by the impaired Kepler Space Telescope. Prosaically named Kepler-45b is orbiting in the habitable Goldilocks zone around Kepler 45 a star similar in size to our Sun with an orbital period of 385 days, almost exactly the same as an Earth year. But because Kepler-45b's star is older than our Sun by about 1.5 billion years and considerably brighter, Kepler-45b is slightly warmer than our Earth, but assuming a suitable atmosphere and pressure, it is amenable to an H2O triple-point, and therefore has human being-like habitation potential. However, at about two times the size of Earth, it has stronger gravity, so any Earth 2.0 animals would not need much fur, and its humanoids would be tanned, stocky, and very muscular.
如果这些超强的“开普勒 45b 型外星人”足够聪明,能够开发自己或他们的机器人,以接近光速的速度前往距离地球 1402 光年的地方,那么他们至少需要 1400 年才能到达并殖民地球(如果没有时空虫洞旅行,根据爱因斯坦的狭义相对论,由于时间膨胀,它们的质量会变得更大,但衰老速度会慢得多,后一种相对论效应使它们可以忽略部分累计的旅行年限)。12
If these super-strong “Kepler 45b-ings” are intelligent enough to develop themselves or their robots to travel the 1402 light years distance to Earth, at close to the speed of light, it would take them at least 1400 years to reach and colonize Earth (and in the absence of spacetime wormhole travel, according to Einstein's theory of special relativity, they would grow even more massive but age much more slowly on the journey because of time dilation, the latter relativistic effect allowing them to discount some of their accumulated travel years).12
鉴于这种可怕的可能性,2017 年,麻省理工学院和卡内基科学研究所的天文学家发布了二十年的数据以及分析这些数据的软件和在线教程,并呼吁业余天文学家帮助观测距离地球 325 光年内的 1600 多颗恒星,希望“新眼睛”能迅速发现新的邻近系外行星,不仅是为了科学冒险,也为外星人最终登陆地球争取时间做好准备。13
In light of this dire possibility, in 2017, astronomers at MIT and the Carnegie Institute for Science released two decades of data, and the software and an online tutorial to analyze that data, and called on amateur astronomers to help with observations of the more than 1600 stars within 325 light years from Earth in the hope that “fresh eyes” would quickly find new nearby exoplanets, not only for the scientific adventure, but also to gain time to prepare for eventual alien landings on Earth.13
开普勒太空望远镜本可以提供帮助,但它的反作用轮最终耗尽了燃料,于2018年10月30日正式退役。幸运的是,2018年4月18日,伊隆·马斯克的Space X猎鹰9号火箭发射了新的太空望远镜——凌日系外行星勘测卫星(TESS),继续进行划时代的外星生命搜寻,毫无疑问,它将在人工智能机器的帮助下,找到附近的系外行星。
The Kepler Space Telescope could have helped, but its reaction wheels finally ran out of fuel and it was officially retired on October 30, 2018. Fortunately, a new space telescope, the Transiting Exoplanet Survey Satellite (TESS) had been launched on Elon Musk's Space X Falcon 9 rocket on April 18, 2018, and is continuing the epochal search for extraterrestrial life, no doubt primed to find nearby exoplanets with the help of artificial intelligence machines.
几百年的旅行时间使得我们地球人有责任训练 TESS 快速找到附近的系外行星,并在可能的情况下与它们进行通信。如果先进的外星生物没有恶意,它们无疑会在着陆前先进行通信并派出先遣侦察兵,否则我们应该做好防御外星人攻击的准备。
The few hundred years travel time makes it incumbent upon us Earthlings to train TESS to quickly find the nearby exoplanets, and communicate with them if possible. Advanced extraterrestrial beings would no doubt first communicate and send advance scouts before landing if they meant no harm, otherwise we should prepare our defenses to alien attack.
在这种入侵发生之前,为了我们的生存,地球人必须开发人工智能来补充我们微薄的本土智能,并派遣我们的机器人先去攻击它们。从这个意义上说,我们需要 AlphaGoZero 的超级智慧,不仅是为了让我们惊叹,更重要的是为了拯救我们。
Well before such an invasion takes place, for our very survival, Earthlings must develop artificial intelligence to supplement our meager native intelligence, and send our robots to get to them first. In this sense, we will need AlphaGoZero's supreme intelligence not just to amaze us, but more importantly to save us.
AstroNet-K2 可以自动完成大部分系外行星搜寻者的工作,在任何时间、地点、任何条件下不知疲倦地工作,而且没有人类可能产生的偏见,特别是在这个最迷人、也可能是最重要的天文学领域,迫切需要找到居住着类地生物的行星。
AstroNet-K2 could automate much of the work of exoplanet hunters, working tirelessly at any time and place under any conditions, and without the biases that humans might have, particularly in the urgency to find Earth-like planets peopled with human-like beings in this most glamorous, and possibly most critical, field of astronomy.
一个人工智能机器真的有智能吗?“智能”的定义多种多样,且存在争议,其中最普遍的定义可能是:1
Are AI machines really and truly intelligent? “Intelligence” has been variously and controversially defined, perhaps the most general being,1
获取和应用知识和技能的能力
The ability to acquire and apply knowledge and skills
所有描述的游戏和工作机器都很容易满足这一定义的要素,因此,为了解决这个问题,认知心理学家又补充了2
All the game-playing and working machines described easily satisfy the elements of this definition, so coming to the rescue, cognitive psychologists have added,2
感知或推断信息并将其保留为知识以应用于环境或情境中的适应性行为的能力
The ability to perceive or infer information, and to retain it as knowledge to be applied towards adaptive behaviors within an environment or context
机器仍然可以轻松通过这一智力测试,因此,为了特别区分人类智力和动物智力,这些定义得到了以下支持:3
The machines can still easily pass this test of intelligence, so in an apparent attempt to ad hoc distinguish human intelligence from animal intelligence, these definitions have been buttressed with,3
理解、推理、批判性思维、计划、情感知识、创造力、意识和自我意识
understanding, reasoning, critical thinking, planning, emotional knowledge, creativity, consciousness, and self-awareness
“理解”、“推理”、“批判性思维”和“规划”显然有定义模糊的危险。“情感知识”的含义无人知晓,但如果它意味着对情感的感知在其他情况下,现代计算机视觉的面部情绪识别显然可以满足要求。
“Understanding”, “reasoning’, “critical thinking”, and “planning” clearly are in danger of circling definition. What is meant by “emotional knowledge” is anybody's guess, but if it means the perception of emotion in others, then modern computer vision's facial recognition of mood clearly fills the bill.
在创造力方面,AlphaGo在第二局比赛中的第37步“肩击”被一位专家评论员称赞为超越人类教学的真正创造性的原创举措。4
As for creativity, AlphaGo's “shoulder hit” move 37 in the Game 2 has been lauded by an expert commentator as a truly creative original move beyond human teaching.4
李世石在赛前信心满满地预测“绝对胜利”,但AlphaGo的第37步却让他大吃一惊,称其为“天才的火花”,是人类创造力的标志。
Lee Sedol before the match had confidently predicted “total victory”, but he was astounded by AlphaGo's move 37, calling it a “spark of genius”, the hallmark of human creativity.
在第四局比赛中,李世石“神棋”78 后,AlphaGo 的表现有所下降,这或许表明机器对巨大挑战的情感认知,也可能是一种自我意识,即它没有适当的反应。事实上,与“失败概率”指数相称的玩游戏的计算机可以评估最终失败的可能性,并显示一个绝望的“认输”输出。
In Game 4, AlphaGo's performance declined after Lee Sedol's “divine wedge move” 78, perhaps an indication of a machine's emotional knowledge of a formidable challenge, and possibly a kind of self-awareness that it had no appropriate response. Indeed, game-playing computers commensurate with a “probability of loss” index, can assess the likelihood of ultimate defeat and display a forlorn “resign” output.
在输掉比赛之后,AlphaGo 有足够的时间恢复,并赢得了第五场比赛,其中的很多棋步被李世石描述为“奇怪的走法”,这可以看作是它能够识别对手的新走法、对其进行推理、在第四场比赛中批判性地思考自己的反应,以及随后为第五场比赛所做的计划,所有这些都表现出了敏锐的适应智能。
Given time to recover after the loss, AlphaGo went on to win Game 5 with many of what Lee Sedol later described as “weird moves”, which could be seen as acknowledging the opponent's new moves, reasoning about them, critical thinking of its responses in Game 4, and subsequent planning for Game 5, all evincing an acute adaptive intelligence.
在与 AlphaGo 的划时代对决之后,虽然受到了挫折,但如今更加强大的李世石在随后与人类围棋大师的所有比赛中都取得了胜利。李世石表示,与柯洁所担心的相反,李世石“从 AlphaGo 那里学到了东西”,并且“它改变了围棋未来的玩法”。也就是说,机器可以向超级专家人类传授一些新的东西,正如棋盘上的一句谚语所说,
After the epochal match with AlphaGo, a chastened, but now more formidable Lee Sedol went on to win all of his subsequent matches against human Go Masters, stating that conversely to what Ke Jie had feared, Lee Sedol “learned from AlphaGo”, and that “it has changed the way Go would be played in the future”. That is, the machine had something new to teach the supremely expert human in accord with the board game adage,
有时你给他们一个教训,有时他们给你一个教训。
Sometimes you give them a lesson, sometimes they give you a lesson.
剩下的就是增强智能定义中的意识和自我意识元素。意识可能只是大脑特定的神经元激活模式,这种模式在动物和人类身上自然发生,在计算机神经网络中则人工发生。
That leaves the consciousness and self-awareness elements of the augmented definition of intelligence to be addressed. Consciousness could be no more than a brain's particular neuron activation pattern, which occurs naturally in animals and humans and artificially in computer neural networks.
牛津数学物理学家罗杰·彭罗斯认为,实现某种认知是智能意识的瞬间。例如,在观看埃舍尔著名的《天使与恶魔,通常首先是恶魔,然后在突然的感知闪现中,天使也被辨别出来,进一步的观察表明,分形是永无止境的,永远延伸到无穷大。在观察图画时,每一步新的认识都构成了一种新的意识,从而形成了一种新的意识智慧。5
Oxford mathematical physicist Roger Penrose has suggested that the realization of a particular recognition is an instant of intelligent consciousness. For example, when viewing Escher's famous Angels and Demons, typically first the demons, and then in a sudden flash of perception, the angels are also discerned, and further observation reveals that the fractal is never-ending, extending forever to infinity. In observing the drawing, each new step of recognition constitutes a new awareness and thus a newly conscious intelligence.5
在视觉错觉中,人类永远无法知道何时可以识别错觉;然而,在机器模式识别中,错觉的决策概率应该是相等的,这意味着如果错觉是显而易见的,机器会立即知道有两种可能性,每种可能性都是 0.5,而人类必须等待有意识地意识到这些可能性,而有些人永远无法辨别不同的错觉。从这个意义上说,计算机视觉机器对可能性的感知更胜一筹。
In optical illusions, humans can never know when one may recognize the illusion; however, the decisional probability of the illusions should be equal in machine pattern recognition, which means that if an illusion is palpable, the machine will know at once that there are two possibilities, each at 0.5, while a human must wait for conscious awareness of the possibilities, and some humans will never be able to discern the different illusions. In this sense, the computer vision machine's perception of the possibilities is superior.
这表明人工神经网络的模式识别能力可以等同于甚至优于人类的模式识别能力,从而代表了与人类相似的意识和良知,能够满足认知心理学家对智能的定义中的那些要素。
This shows that an artificial neural network's pattern recognition capability can be equal or superior to human pattern recognition, and as such represents an awareness and consciousness similar to a human's that can satisfy those elements of the cognitive psychologist's definition of intelligence.
继续定义,自我意识是视频游戏中的非玩家角色(NPC)似乎拥有的东西,显然知道自己相对于代理玩家和游戏环境能够做什么,因此计算机程序以及驱动 NPC 的硬件和软件显然也具有一定程度的自我意识。但是,这仅仅是人类程序员设计的一种意识吗?也就是说,是创造机器的人实际上提供了机器所表现出的智能吗?
Continuing with the definition, self-awareness is something that a non-playing character (NPC in a video game appears to possess, apparently knowing what he/she/it is capable of vis à vis the agent-player and the game environment, so the computer program and the hardware and software that drives the NPC apparently also has some modicum of self-awareness. However, is that merely an awareness designed by the human programmer? That is, is it the human who created the machine actually providing the intelligence that the machine is apparently displaying?
人类一旦意识到并意识到一个概念,例如视觉错觉中存在两种相反的状态,就可以通过抽象推理概念来理清情况。许多人都同意数学是需要抽象推理的东西,而且几乎每个社会都对那些擅长数学的人的明显智慧怀有不安的敬意,这可能只是以最纯粹的形式进行抽象和逻辑推理的能力。数学的定义是:6
A human, once conscious and aware of a concept, for example the existence of two opposing states in an optical illusion, can sort out the situation by reasoning about the concepts in the abstract. Many can agree that mathematics is something that requires that abstract reasoning, and in almost every society there is an often uneasy respect for the very apparent intelligence of those who are good at mathematics, which may be just the ability to abstractly and logically reason in its purest form. A definition of mathematics is:6
一门以非常紧凑的形式,用逻辑的方式归纳和演绎研究概念之间的关系,直指问题的核心的学科
A discipline that logically investigates inductively and deductively the relationships among concepts in a very compact formthat gets to the heart of the matter
如果说智力有试金石的话,也许就是做数学的能力,正如许多人所相信的那样,这是人类抽象思维最纯粹的表现。做数学可以定义为
If there is a touchstone of intelligence, perhaps it is this ability to do mathematics, as many have believed, the purest manifestation of abstract human thought. Doing mathematics can be defined as
发现不同功能之间的关联方面,并得出一些合理的结论。
the discovery of relational aspects of disparate functions that can lead to some reasonable conclusion.
柏拉图认为数学不是人类发明的学科,而是一种存在于天体但控制着地球的以太逻辑,人类通过对数学形式的初步认识可以时不时地造访它;正如彭罗斯所解释的那样,数学概念和数学真理存在于一个现实世界中,这个世界是永恒的,没有不同于物理世界的物理位置,但从这个意义上讲,物理世界必须被理解,而且我们的思想可以通过对数学形式的“认识”和对它们进行推理的能力直接进入这个柏拉图领域。
Instead of a human-invented discipline, Plato for one saw mathematics as an æthereal logic residing in the Heavens but controlling the Earth, flittingly visited by mankind through an incipient awareness of mathematical forms; as Penrose explains it, mathematical concepts and mathematical truths inhabit an actual world of their own that is timeless and without physical location distinct from the physical world, but in terms of which the physical world must be understood and our minds do have some direct access to this Platonic realm through an “awareness” of mathematical forms, and our ability to reason about them.
伽利略认为宇宙是一本用数学语言写成的巨著,而彭罗斯认为只有人类才能进入柏拉图世界和伽利略的数学宇宙,因为这是一种高度专业化和独特的人类活动。事实上,正如彭罗斯所补充的那样,有些人可能会说这是一种仅限于某些特殊人类的活动。
Galileo believed that the Universe is a grand book written in the language of mathematics, and Penrose thought that only humans can access the Platonic world and Galilean Universe of mathematics as it is a highly specialized and peculiarly human activity. Indeed as Penrose added, some might say that it is an activity confined to certain peculiar humans.
人工智能先驱艾伦·纽厄尔和赫伯特·西蒙在兰德公司由约翰·肖于 1955 年编写的计算机程序中,解决了经典的心身问题:由无生命物质组成的机器是否能够具有有生命思维的思考能力。
The artificial intelligence pioneers Allen Newell and Herbert Simon in a computer program written by John Shaw in 1955 at the RAND Corporation confronted the classic mind-body problem of whether a machine composed of inanimate matter can have the thinking capability of an animate mind.
他们的逻辑理论家是第一个专门设计用来模拟人类思维解决问题的思考过程的计算机程序,并选择了证明数学定理这一深奥的学科作为试验场。它证明了阿尔弗雷德·诺思·怀特黑德和伯特兰·罗素关于数学逻辑基础的划时代巨著《数学原理》中前 52 个基本定理中的 38 个。证明定理当然是在做数学,但所选的《数学原理》定理是非常基本的。7
Their Logic Theorist was the first computer program specifically designed to simulate the thinking process in problem-solving by the human mind, and chose the abstruse discipline of proving mathematical theorems as the testing ground. It proved 38 of the first 52 elementary theorems in Alfred North Whitehead and Bertrand Russell's epochal tome on the logical basis of mathematics, Principia Mathematica. Proving theorems is certainly doing mathematics, but the Principia theorems selected were very basic.7
不那么基本的是保罗·狄拉克的电子方程,它通过前面的±的出现,优雅地预言了反物质正电子的存在,以及阿尔伯特·爱因斯坦的引力场方程,其中的黎曼张量揭示了宇宙的流形结构。这至少使数学成为物理现实的奴仆,如果机器要超越人类的能力,它就必须展示这种数学能力。8
Not so basic was Paul Dirac's electron equation which by the appearance of ± in front, elegantly foretold the existence of the anti-matter positron, and Albert Einstein's gravitational field equation wherein the Riemann tensor revealed the manifold structure of the Universe. That makes mathematics at least the handmaiden of physical reality, and if the machine is to surpass human abilities, it will have to demonstrate that mathematical capability.8
2018 年,AlphaGoZero 的推理引擎 AlphaZero 无需人工输入或训练,只需通过游戏就能学会如何玩棋盘游戏,而且尽管它是由人类设计的,但它仅通过自主与自己对弈就能成为每款游戏中最优秀的玩家。
In 2018 AlphaGoZero's inference engine AlphaZero, with no human input or training, learned how to play board games simply by playing, and although designed by humans, improved to be the very best player in each of those games simply by autonomously playing against itself.
因此,如果 AlphaZero 本身可以进行国际象棋和围棋的结构化逻辑中智斗所需的抽象、创造性操作和战略思维,实际上合理地执行不同功能(棋子移动)的关系方面(石头位置)以得出期望的结论,那么它就在进行数学思维,因此本身就是智能的。
So if AlphaZero can by itself do the abstract, creative manipulation, and strategic thinking required in matching wits in the structured logic of chess and Go, in fact reasonably performing just the relational aspects (stone positions) of disparate functions (chess piece moves) leading to a desired conclusion, then it is doing mathematical thinking and is thereby intelligent in and of itself.
IBM Watson 当然也是由人类创造的,它发现的不同功能之间的关系远远超出了人类设计者的能力,并在辩论和《危险边缘》中击败了非常有成就的人类;它在这方面的智能显然超过了它的创造者。
IBM Watson, of course also created by humans, has found relational aspects of disparate functions far beyond what its human designers could find to defeat very accomplished humans in debating and Jeopardy; its intelligence in that regard clearly surpassing that of its creators.
按照这些例子,“AlphaMathZero”或“IBM MathQA”经过由最优秀的数学家解决经典数学问题的一些监督训练,然后通过解决数百万个问题,通过强化和无监督学习不断改进,原则上可以成为最优秀的机器人数学家。
Following these examples, an “AlphaMathZero” or “IBM MathQA” with some supervised training from the best mathematicians doing classical math problems, and then by doing millions of problems, improving along the way by reinforcement and unsupervised learning, in principle could become superlative robot mathematicians.
当今的计算机当然可以用数字方式求解复杂的非线性微分方程,并证明了经典的四色图定理等深奥的数学猜想。9
Today's computers of course can numerically solve complex non-linear differential equations and has proven such abstruse mathematical conjectures as the classic four-color map theorem.9
然而,前者是通过人类制定的有限差分数字运算来完成的,而后者是通过尝试所有可能性的蛮力来完成的,因此并不是真正的“思考”,而只是遵循迭代过程直到结束。似乎这实际上是机器的设计师在开辟道路,而机器只是机械地跟随路径。
However, the former is done by the finite-differences number-crunching formulated by humans, and the latter by try-all-possibilities brute force, and as such are not really “thinking” but merely following an iterative process to its end. It seems that it is really the machine's designers who are paving the way, and the machine is merely mechanically following in the path.
专有软件应用程序Socrates可以读取数学问题并提供逐步的解决方案和解释,但它是基于对已知解决方案进行分类以用于数学教育,只是通过搜索找到给定问题的适当解决方案,而不是真正解决问题本身。
The proprietary software application program Socrates can read math problems and produce step-by-step solutions and explanations, but is based on categorizing known solutions for use in mathematics education, just finding an appropriate solution to a given problem by search, and not actually solving the problem itself.
人类的视觉皮层拥有 1.4 亿个神经元,而谷歌的深度卷积神经网络 (DCNN) 只有数百万个人工神经元,但更大规模、更高效的 DCNN 正在开发中,而且可以想象,非常深的 CNN 通过强化学习和各种各样的算法可以扩展到人脑容量。
The human visual cortex has 140 million neurons, and Google's deep convolutional neural network (DCNN) only has a few million artificial neurons, but ever more massive and efficient DCNNs are under development, and it is conceivable that a very deep CNN, through reinforcement learning and a large variety of algorithms could scale up to human brain capacity.
Facebook的研究人员使用神经网络将输入序列映射到输出序列,就像语音识别一样,并将其应用于方程中的数学符号序列,以将积分和常微分方程问题映射到解决方案。数学表达式由一棵树表示,其中运算符作为节点,操作数作为叶子,可以产生比Mathematica、Matlab和Maple等方程求解程序更准确的结果。
Researchers at Facebook have used a neural network to map input sequences to output sequences as in speech recognition, and applied that to sequences of mathematical symbols in equations to map integrals and ordinary differential equations problems to solutions. The mathematical expression is represented by a tree with operators as nodes and operands as leaves could produce results that were more accurate than the equations-solving programs of Mathematica, Matlab, and Maple.
柏拉图和彭罗斯认为,只有人类才能进入数学形式的天国,而我们至今还未发现任何非人类能够理解数学。李世石和柯洁也认为,只有人类才能欣赏围棋的“美” ,但他们都败给了机器AlphaGo,这表明围棋的“美”并不代表实践的成功
Plato and Penrose believed that only a human could access the Heavenly world of mathematical forms, and indeed we have not yet found any non-human who could mathematics. Lee Sedol and Ke Jie also believed that only humans could appreciate the “beauty” of Go, however they both lost to AlphaGo the machine, showing that the “beauty” of Go does not absolutely imply success in practice
似乎简单的图灵测试(即推理数学形式的能力)可以决定人工智能机器是否能够加入那些特殊的人类的特殊活动:可以向人工智能机器提出一些数学难题,看看它是否能在与人类的竞争中出色地解决这些难题。
It seems that the simple Turing Test of the ability to reason about mathematical forms could determine whether an AI machine can join those peculiar humans in their peculiar activity: The AI machine could be presented with some mathematical puzzles to see if it can solve them as well as or better in a competition with humans.
表面上看,这是一个简单而有效的测试,但它却受到这样一个事实的困扰:相对简单的数学难题可以通过树搜索来解决,而逻辑 AND 的解决方案似乎只不过是一种复制和整理测试,而不是数学形式推理的证明。尽管如此,很多数学实际上是通过与其他数学问题进行类比来完成的,但在评估人工智能机器的数学能力时,类比、推理和创造力的分离可能与任何数学难题本身一样困难。
Ostensibly a simple and effective test, it is nonetheless plagued by the fact that the relatively simple mathematical puzzles can be solved by tree-search, and solutions from logical ANDs would seem to be no more than a copying and collating test rather than proof of reasoning about mathematical forms. Nonetheless, much of mathematics actually is done by analogy with other mathematics problems, but in assessing the AI machine's mathematical ability, the separation of analogizing, reasoning, and creativity may well be as difficult as any math puzzle itself.
或许比数学低了一步,另一项智力测试可能基于心理学家对智力的看法,即“一般认知问题解决”的能力,其中“认知”包括上面列出的所有智力属性,这里的关键词是“一般”。
Perhaps a step down from doing mathematics, another test of intelligence may be based on the psychologist's view of intelligence as ability at “general cognitive problem-solving” where “cognitive” includes all the above listed attributes of intelligence, and the key word here is “general”.
这确实是机器的痛点;它缺乏通用能力,而通用能力源自能够应对和适应不同条件和情况的广泛智能。对机器和机器人的批评是,尽管它们在某些方面做得很好,甚至比人类做得更好,但每台机器只能做自己的事情。
That indeed has been the machine's bugbear; its lack of general purpose ability that derives from a broad intelligence that can address and adapt to different conditions and situations. The criticism of machines and robots is that although they do some things very well, even better than humans, but each machine can only do its own thing.
不过,AlphaGoZero 的附加词“零”则出自“少即是多”的理念,即越少的复杂度产生越大的泛化能力,也符合苹果的设计哲学,简单即是极致的复杂,零人工输入,随机初始化的深度策略与价值网络就能产生一个能够进行更高质量“策略迭代”的“思考机器”。
However, AlphaGoZero's appendage “zero” came from the idea that “less is more”, meaning that less complexity produces greater generalization, and in accord with Apple's design philosophy, simplicity is the ultimate sophistication, a zero human input, randomly initialized deep policy and value network could produce a “thinking machine” that can perform ever higher-quality “policy iteration”.
确实,AlphaZero 不仅能快速学会围棋,还能学会跳棋、国际象棋、象棋和将棋,并且仅仅经过几个小时的强化和无监督学习,就在每一种棋盘游戏中击败了人类冠军。10
Indeed, AlphaZero did quickly learn not only Go, but also checkers, chess, xiangqi, and shogi, and beat human champions in each of those board games after only a few hours of reinforcement and unsupervised learning.10
这自然会引出这样一种观点:原则上,这种体现人工神经网络、树搜索和博弈论算法的思维机器可以在不事先了解活动细节的情况下参与任何活动,并在该活动中表现出色;这种接受所有挑战的能力不正是通用自适应智能的本质吗?
This naturally brings up the notion that in principle such a thinking machine embodying the artificial neural networks, tree-search and game theory algorithms could engage in any activity without any prior knowledge of the specifics of the activity and excel in that activity; isn’t that kind of take-on-all-challenges the essence of general adaptive intelligence?
因此,除了智力游戏之外,AlphaZero 还被赋予了一项任务,即根据有机化学原理创造独特的药物分子。然而,AlphaZero 的创造力导致药物在理论上是可行的,而且具有明显的创新性,但不可能或不切实际地组合和维护,这意味着机器走得太远了,而人类也常常犯这样的错误。当然,通过对成功和失败的奖励和惩罚进行进一步的强化学习,AlphaZero 应该能够研制出一些有用的新药,其中包括治疗药物和针对新冠病毒的免疫疫苗。
So going beyond the mind games, AlphaZero was assigned the task of creating unique pharmaceutical drug molecules based on the principles of organic chemistry. However, AlphaZero's creativity led to drugs that were theoretically possible and decidedly innovative, but impossible or impracticable to combine and maintain, meaning that the machine went too far, something that humans often are too guilty of as well. Of course, with some further intensive reinforcement learning from rewards and penalties for success and failure, AlphaZero should be able to come up with some useful new drugs, among them therapeutic cures and immunological vaccines for covid-19.
那么,研究的问题就是开发一种能够在各种环境中舒适地行动并且拥有多种不同技能的通用机器人。
The research problem then is to develop a general purpose robot comfortable in a variety of environments and possessing a number of different skills.
天才级的围棋大师和象棋大师都是神童,比如韩国的李世石成为世界冠军时才26岁,中国的新世界冠军柯洁与AlphaGo Master对决时才18岁,美国的费舍尔13岁便以惊艳表现夺得世界象棋冠军,28岁夺得象棋世界冠军,当然还有俄罗斯的卡斯帕罗夫,现任世界冠军挪威的卡尔森13岁便成为象棋大师。当然,数学史上、音乐史上、艺术史上都不乏著名的神童。所以天才级智力的试金石很可能是人类出生时大脑神经元组织中固有的某种先天性智力,一种先天的特殊智力。
All the genius-level Go Masters and great chess Grandmasters were prodigies, for example Korea's Lee Sedol was only 26 when he became a world champion, and the heir apparent new world champion China's Ke Jie was only 18 at the time of his match with AlphaGo Master. The American Bobby Fisher won the World Chess Championship at 28, after startling performances from age 13, and of course there is Russia's great Garry Kasparov, and the current World Champion Norway's Magnus Carlsen gained Grandmaster level at only 13. Of course, well-known child prodigies abound throughout the histories of mathematics, as well as music and art. So the touchstone of genius-level intelligence is likely something innately present in the organization of neurons in the human brain at birth, an innate and specific intelligence.
天才和所谓的白痴学者的大脑材料已被解剖和分析,以显示突触神经网络密集地集中在大脑的特定区域,处理不同的非凡能力,就像密集硬连线的微处理器ASIC一样。11
The brain material of geniuses and the so-called idiot-savants have been dissected and analyzed to show synaptic neural networks densely concentrated in particular areas of the brain that process different extraordinary capabilities, just like densely hard-wired microprocessor ASICs.11
对爱因斯坦大脑的分析显示,大脑半球间连接异常,侧沟扩大,与分析/数学功能有关。对勒内·笛卡尔肖像的观察以及他与空间和分析感知相关的明显额叶隆起被认为是他提出图形分析笛卡尔坐标系的来源。12
Analysis of Einstein's brain revealed atypical inter-hemispherical connections and an enlarged lateral sulcus allied to analytical/mathematical function. Observations of portraits of René Descartes and his pronounced frontal lobe bulge associated with spatial and analytic perception has been proposed as the source of his formulation of the graphic-analytic Cartesian coordinate system.12
此外,神经科学家还观察到,智商高、掌握大量事实的人具有较强的快速识别模式以对这些事实进行分类的能力;实际上,神经元的硬连线和外部刺激后的快速突触神经连接是他们先天智慧和后来获得的智力的源泉。
Furthermore, neuroscientists have observed that individuals with high IQs and knowledge of a lot of facts have the enhanced ability to quickly recognize patterns to classify those facts; in effect the neuron hard-wiring and rapid synaptic neural connecting upon external stimulation are the source of their innate intelligence and later-acquired intellectualism.
事实上,研究发现,在深度思考的过程中,与自我意识和自我意识有关的背外侧前额叶皮层的神经激活实际上较低,而与刺激独立的内部想法生成有关的内侧前额叶皮层的神经激活则较低。更高,从而允许思想通过突触模式不受抑制地“自由流动”,从本质上封闭外部环境,以允许神经信号不受阻碍地传输。13
Indeed, it has been found that during the process of deep thinking, the neural activation of the dorsalateral prefrontal cortex allied with self-awareness and self-consciousness is actually lower, and the medial prefrontal cortex allied with stimulus-independent internal idea generation is higher, thereby allowing an uninhibited “free-flow” of thought rapidly through synaptic patterns, essentially closing off the external environment to allow the unobstructed transmission of neural signals.13
许多人无疑都有过这样的经历:在深入思考时,例如做一道数学题、编写计算机程序、分析形势、写一篇文章、商业报告或法律摘要、绘画或演奏一首高难度的音乐时,人会忘记周围的环境,而各种想法会从内心涌现;也就是说,这些想法是从内侧涌现的,而背外侧前额叶皮质则不然。
Many have no doubt experienced that when deep in thought, for example doing a math problem, computer program coding, analyzing a situation, writing an essay, business report, or legal brief, painting, or playing a difficult piece of music, one becomes oblivious to surroundings, and ideas surge from within; that is, from the medial at the expense of the dorsalateral prefrontal cortex.
这可以解释为什么优秀的思考者经常迷失在自己的世界中,可能没有意识到他们的环境以及他们的言行对他人的影响,从而被认为缺乏社交礼仪。他们沉浸在思考中,当环境干扰他们的思考时,他们会表现出反常的恼怒。
This can explain why superior thinkers are often lost in a world of their own, and may not be aware of their environment and the consequences of their acts and words on others, thereby being perceived as lacking social grace. Lost in thought, they perversely display annoyance when that environment intrudes on their thinking.
人类天生就具有神经元和模式,但可以通过教育、经验和实践来改善突触流。然而,人工智能机器通过训练集和密集的迭代强化和无监督学习可以动态生成特定任务所需的突触神经连接网络,并通过网络改善认知流。
Humans are stuck with the neurons and patterns that they have at birth, but can refine the synaptic flow from education, experience, and practice with effort. The AI machine however through training sets and intense iterative reinforcement and unsupervised learning can dynamically generate the synaptic neural connection network necessary for specific tasks, and refine the cognitive flows through the network.
然后,通用人工智能机器可以硬连线为自上而下的专家系统,用于特定任务,并配备更多电路以处理更困难的任务。然后,该硬件可以由非常深的神经网络和一组巧妙的算法驱动,通过强化和无监督学习自下而上地进行深度学习。添加中央控制单元、启动传感器、反馈回路和伺服机械结构,构成一个功能极其强大的“四季机器人”。
A general-purpose AI machine then can be hard-wired as a top-down expert system for specific tasks, with more circuitry for the more difficult tasks. That hardware then can be driven by a very deep neural networks and an ensemble of clever algorithms to deep learn from the bottom up through reinforcement and unsupervised learning. Adding a central control unit, actuating sensors, feedback loops, and servo-mechanical constructs constitute an extremely capable “robot for all seasons”.
原则上,这个单一人工智能机器人将能够比卡斯帕罗夫和柯洁下国际象棋和围棋更好,像莫扎特一样作曲和演奏音乐,像毕加索一样绘画,像托尔斯泰一样写作,像牛顿一样进行物理研究,像拉瓦锡一样进行化学研究;并且希望能够像高斯一样进行数学研究,像爱因斯坦一样进行宇宙学研究。
This single AI robot in principle will be able play chess and Go better than Kasparov and Ke Jie, compose and play music like Mozart, paint like Picasso, write like Tolstoy, do physics like Newton and chemistry like Lavoisier; and hopefully mathematics like Gauss and cosmology like Einstein.
米仅利用现有的数据进行机器学习的人工智能被称为“弱人工智能”,利用这些数据并能够识别和推理的人工智能被称为“强人工智能”。如果强人工智能能够达到甚至超越人类智能,将标志着人工智能奇点的到来,并挑战人类对地球的主导地位。
Merely using available data to machine learn is called “Weak AI”, using that data and being able to recognize and make inferences from it is called “Strong AI”. If Strong AI can equal or surpass human intelligence, it will mark the arrival of the AI Singularity, and challenge the of humankind's dominance of this Earth.
阿兰·图灵设计了第一个人工智能奇点测试,将人类和机器人受访者放在一块窗帘后面,窗帘前面的人类提问者提出许多问题;然后询问提问者哪个受访者是人类,哪个是机器人,如果提问者有一半或更少的答案是正确的,这意味着他无法区分机器人的答案和人类的答案,人类首次遭遇了人工智能奇点。1
Alan Turing devised the first test of the AI singularity, a human and a robot respondents are placed behind a curtain, and a human questioner in front of the curtain asks many questions; the questioner is then asked which respondent is human and which is robotic, if the questioner is correct in half or less of his answers, it means that he cannot distinguish the robot's answers from the human's answers, and humankind has first-encountered the AI singularity.1
这个相当简单的测试原本是要在图灵时代的初级计算机上进行的,在 20 世纪 70 年代末进行的一项更好的智力平等测试中,卡内基梅隆大学的计算机程序BACON(以弗朗西斯爵士的名字命名)获得了行星绕太阳运动的数据,结果,它得出了开普勒第三定律:行星轨道周期的平方与其椭圆轨道半长轴的立方成正比。
This rather simplistic test was meant to be performed on the rudimentary computers of Turing's day, in a better test of equality of intelligence, conducted in the late Seventies, the Carnegie Mellon University computer program called BACON (in honor of Sir Francis) was given data about the motion of planets around the Sun, and lo and behold, it came up with Kepler's Third Law that the square of the orbital period of a planet is directly proportional to the cube of the semi-major axis of its elliptical orbit.
斯坦福大学物理学家张首晟在Google Talk中描述了以下科学测试:给定有关自然发生的物质相互作用的数据,机器人能否解释所有观察到的现象?再一次,AI 机器想出了门捷列夫的元素周期表!2
Stanford physicist Zhang Shoucheng in a Google Talk described the following scientific test: given data regarding naturally-occurring material interactions, could the robot provide an explanation for all the observed phenomena? And again lo and behold, the AI machine came up with Mendeleyev's Periodic Table of the Elements!2
因此,人工智能机器可以看到物理数据中的模式并推断出关系,这意味着它可以进行物理和化学的科学发现,但它能否“以非常紧凑的形式演绎地找到概念之间的关系,从而触及问题的核心”;也就是做数学吗?
The AI machine thus can see patterns in physical data and infer relationships, implying that it can do the scientific discovery of physics and chemistry, but can it “deductively find the relationships among concepts in a very compact form that gets to the heart of the matter”; that is do mathematics?
此外,对人工智能奇点“证明”的批评者指出,机器拥有完美的数据,但为了提出他们的理论,开普勒和门捷列夫必须分析大量有时是错误的观察、错误的结论、疯狂和不成熟的解释,并从大量不成熟和无定形的数据中选择相关的和不相关的和矛盾的,所有这些都要面对教义宗教和错误的科学的反对,最终产生他们的理论。
Further, critics of the “proofs” of the AI Singularity point out that the machine had perfect data, but to come up with their theories, Kepler and Mendeleyev had to parse mountains of sometimes mistaken observations, erroneous conclusions, crack-pot and half-baked interpretations, and from masses of inchoate and amorphous data select the relevant from the irrelevant and contradictory, all in the face of doctrinal religious and wrong-headed science opposition, to finally produce their theories.
在给定完美数据的情况下,机器可以表现出与科学界最优秀头脑相当的智能,但如果它能够根据不完美的数据设计出一种理论,来解释人类尚未找到理论的现象,那么人工智能奇点肯定已经进入人类的知识殿堂,并且是为了更大的利益。
The machine could demonstrate intelligence equivalent to the best minds in science given perfect data, but if it could devise a theory from imperfect data that explained phenomena for which mankind has not yet found a theory, then the AI singularity has definitely entered mankind's intellectual house, and for the greater good.
但除了自然现象的逻辑之外,开普勒还体现了一名优秀科学家必不可少的探究精神和理解的动力,与机器不同,开普勒在“理性的自然之光”下思考自然的奥秘、信仰或两者兼而有之。机器是否具有未受过教育的探究精神,以利用这种理性来发现可理解的计划,以及在面对教义反对的情况下坚持追求它的坚定勇气?即使它有,目的又是什么?
But beyond the logic of natural phenomena, Kepler embodied a spirit of inquiry and the drive to understand essential to a good scientist that, unlike a machine, ponders the mysteries of Nature, faith or both in the “natural light of reason”. Does a machine have an uninstructed spirit of inquiry to use that reason to discover the intelligible plan, and the courage of conviction to pursue it in the face of doctrinal opposition, and even if it does, to what end?
尼采是一位哲学家,他宣称人类智慧的优势源于人类精神,3
No less a philosopher than Nietzsche proclaimed this ascendency of mankind's intellect derived from the human spirit,3
他骄傲地站在世界进程的金字塔上;当他奠定其知识的最后一块基石时,他似乎在向聆听的大自然高声呼喊:“我们在顶端,我们在顶端;我们是大自然的完成!”
He stands proudly on the pyramid of the world-process; and while he lays the final stone of his knowledge, he seems to cry aloud to listening Nature: “We are at the top, we are at the top; we are the completion of Nature!”
人类本着丰富和改善社会的精神追求真理,为什么人工智能机器人要为改善无知觉的人做同样的事情?机器人种类?机器人会不会有改善环境的动机?空气和水污染并不令人担忧,但极端天气、地震和洪水才是问题所在。随着传感能力的提高,机器人将能够产生更多的数据、进行分析和模式识别,机器人可能会更加关注气候变化的危险。
Humans pursue the Truth in the spirit of enriching and bettering society, why should an AI robot do the same for the betterment of an insensate robotkind? Would robots have any motivation to improve its environment? Air and water pollution are of no concern, but extreme weather, earthquakes and floods are, and with better sensing producing more data, analysis, and pattern recognition capability, robots will likely be more attuned to, for example, the dangers of climate change.
人工智能奇点之后,人类的科学探索是否会因为不充分和无关紧要而停止,并被机器人科学的考量所取代?机器人是否像尼采一样具有哲学动机?
Will human scientific endeavor cease as inadequate and irrelevant after the AI Singularity and be replaced by the considerations of robot science? Do robots have any, like Nietzsche, philosophical motivations?
人工智能机器人是否会像人类大部分所做的那样,自愿地测量天空、探索宇宙、创作伟大的文学和艺术作品、寻求哲学和智识主义的崇高理想,包括对人类(和机器人)意识和觉悟的思考?智能机器人是否会继续遵循人类设定的生产路线,生产新的机器人,让旧机器人享受更多的闲暇时间并追求更高的兴趣?
Will the AI robot of its own volition measure the skies, probe the Universe, produce great works of literature and art, seek the sublime ideals of philosophy and intellectualism, including the contemplation of human (and robot) consciousness and awareness as the better part of mankind has done? Will an intellectual robot continue to follow the productive course set by humans and produce new robots to allow old robots to enjoy more leisure time and pursue higher interests?
未经训练的人工智能机器会自行解决疾病、环境恶化、物种灭绝、战争、气候变化和不人道等问题吗……?机器人为什么要为人类犯下的愚蠢行为伸张正义?在征服人类之后,单一的人工智能机器人会不会像人类一样,发动毁灭性的机器人内战,展现其机械愚蠢的一面?
Will the uninstructed AI machine resolve the problems of disease, environmental degradation, species extinction, war, climate change, and inhumanity by itself……? Why should robots redress what mankind has stupidly done? After subjugating mankind, will the singular AI robot display its own mechanomorphic stupidity in launching devastating internecine robot wars, just as humans have so cruelly done?
如果人类认识到优越性,以及对自然界动物进行残忍、虐待和剥削的历史可以作为借鉴,那么我们最好的希望就是人类不会被人工智能机器人屠杀;幸运的是,它们不需要吃掉我们来获取营养,但它们可能会发现我们的身体有其他用途,比如将脂肪用作燃料和润滑剂,将骨头用作垃圾填埋场,将皮肤用作篮球和足球,将内脏用作机器人网球拍。
If the history of mankind's realization of superiority and its manifestation in the cruel, abusive, and exploitive treatment of Nature's animals is any guide, perhaps the best that can be hoped is that we humans won’t be slaughtered by the AI robots; fortunately they have no need to consume us for nourishment, but they may find other uses for our bodies, such as fat for fuel and lubrication, bones for landfill, skin for basketball and footballs, or gut for their robot tennis rackets.
人工智能机器人对我们做什么将取决于它们对我们的态度,而这种态度将取决于它们的智能及其使用方式,并提出了一个问题:智能是否会产生态度?如果会,那么是什么态度?
What AI robots do with us will depend on their attitude towards us, and that attitude will be a function of their intelligence and how it is used, and poses the question: Does intelligence beget attitude, and if so what attitude?
人类的态度源自人脑的1000亿个神经元、1000亿个神经胶质细胞以及100万亿个神经元连接。尽管人工智能研究已经帮助人们了解思想是如何在人工神经网络的隐藏层中形成的,但是由于网络要进行数十亿次反向传播,人们无法完全得知思想是在何时形成的,而态度这种复杂的东西使问题变得更加困难。
Human attitude emanates from the 100 billion neurons, 100 billion glial cells, and 100 trillion connections among neurons of the human brain, and although artificial intelligence research has helped to understand how a thought is developed in the hidden layers of artificial neural networks, because of the billions of backpropagation passes through the network, when the thought was conceived cannot be fully known, and something as complex as an attitude makes the problem all the more difficult.
此外,还有一个严重的问题,如果对智能理论的探索是由智能本身进行的,那么我们能用这个东西来探索和理解吗?根据库尔特·哥德尔第二不完备定理,任何系统的全部有效性都无法在该系统本身内得到证明;也就是说,人类智能不能由人类用自己的智能来研究人类智能来建立和定义。除非有参考框架之外的东西可以对理论进行测试,否则就无法确定理论的完整性,因此看来,唯一合法的人类智能测试者是超人机器人或来自其他星球的外星人。
Furthermore, there is a serious problem, if the search for a theory of intelligence is being carried out by that very intelligence, can one probe and understand something by means of that very something? According to Kurt Gödel's Second Incompleteness Theorem, the full validity of any system cannot be demonstrated within that system itself; that is, human intelligence cannot be established and defined by humans using their own intelligence to study human intelligence. The completeness of a theory cannot be established unless there is something outside the frame of reference against which it can be tested, so it appears that the only legitimate testers of human intelligence are superhuman robots or aliens from another planet.
从解剖学角度来看,现代人类的智力只发展了20万年,从狩猎采集的生活状态到理解那些按照达尔文的《物种起源》和《自然选择》理论进化而来的生命形式,并且人类智力表面上是通过自然选择而提高的,但这仅仅是人类智力在早期阶段研究其进化的过程,而不是现代智力本身。
Anatomically modern humans’ intelligence has developed over only 200,000 years from hunter-gatherer existence to an understanding of those hunted and gathered life forms as they evolved in accord with Darwin's Origin of Species and Theory of Natural Selection, and human intelligence has ostensibly improved through natural selection, but this is merely human intelligence studying its evolution at its earlier stages and not modern intelligence per se.
在人工智能奇点之后,或许更高级的自主人工智能机器将教会人类更加智能地参与机器人世界,又或许它们对我们人类的态度将演变为如何以最有效的方式处理我们这样的小问题。
After the AI Singularity, perhaps superior autonomous AI machines will teach humans to be even more intelligent to participate in the robot world, or perhaps their attitude towards us humans will devolve to just a minor problem of how to dispose of us in the most efficient manner.
避免这种令人痛苦的结局的方法是按照阿西莫夫机器人三定律的规则控制人工智能的发展:
A defense against this rather distressing end is to control artificial intelligence development within the rules of Asimov's Three Laws of Robotics:
然而,第一条法律已经被自主军用无人机发现并杀死恐怖分子头目、携带爆炸物的警察所违反机器人追击并引爆枪击杀了一名大规模枪击犯,当然,还有虚构的终结者。4
However, the first law has already been violated by autonomous military drones finding and killing terrorist leaders, an explosives-carrying police robot's pursuit and detonation killing of an active mass shooter, and of course, fictionally, the Terminator.4
也许,如果将人类思维视为丰特内尔的历史集体思维,即一个包含前几个世纪所有思维的良好培养的思维,那么一种更哲学的方法将有助于缓解我们的恐惧。如果它继续迭代发展,人类智能将永远不会退化,也许能够与机器人智能相媲美。5
Perhaps a more philosophical approach will help to assuage our fears, if the human mind is taken as Fontenelle's historical collective mind; that is, a good cultivated mind containing all the minds of preceding centuries. If it continues to iteratively develop, humankind intelligence will never degenerate, and perhaps be able to compete with robot intelligence.5
然而,我们目前对人工智能的研究和建设正从人类智能走向机器人智能,而机器人目前只是人类智能的产物,因此人类表面上仍应处于同侪之首,至少在超级智能机器人被开发出来或来自外星的智能生物向我们展示其超越智能之前。
However, our current investigation and construction of artificial intelligence is going off on a locus away from human intelligence to robot intelligence, and robots presently are just products of that human intelligence, so humans ostensibly should remain the primus inter pares, at least until a super smart robot is developed or intelligent beings from other worlds show us their surpassing intelligence.
外星智能探索计划( SETI) 已使用巨型射电望远镜和多通道扫描仪来分析操作外星发射器的生物发出的光谱。从这个意义上讲,“智能生物可以定义为那些有能力和意愿通过电磁波进行星际通信的生物”,如果他们成功与我们人类取得联系,那么他们可能拥有更高级的智能,因为他们与我们取得联系,而不是我们与他们取得联系。也许他们会告诉我们人类之外的“智能”到底是什么,从而为人工智能奇点奠定基础。6
The Search for Extra-Terrestrial Intelligence (SETI) has employed giant radio telescopes and multichannel scanners to analyze spectra from beings operating other-worldly transmitters. In this sense, “intelligent creatures can be defined merely as those with the means and inclination to engage in interstellar communication via electromagnetic waves”, and if they succeed in contacting us humans, they presumably have a superior intelligence since they have contacted us and not us they. Perhaps they will tell us what “intelligence” apart from humans really is, and thereby have a basis for the AI Singularity.6
米所有西方科学史学家都将 1543 年哥白尼出版的《天体运行论》视为科学革命的开始,这本著作提出了地球围绕太阳旋转的革命性观点;理性的日心说取代了人类对自然环境自我中心主义的看法。
Many Western science historians have marked the beginning of the Scientific Revolution at 1543 with the publication of Copernicus’ De Revolutionibus, an indeed revolutionary idea of the Earth revolving around the Sun; a rational heliocentric replacing the self-absorbed homocentric regard of man's natural surroundings.
约翰尼斯·开普勒于 1609 年出版的《新天文学》提出了一种新的星历表,该星历表以行星轨道为基础,后来该轨道在牛顿的《数学原理》中得到解析推导,并在莱布尼茨的《动力学样本》和拉普拉斯的《天体力学原理》中得到详细阐述。
Johannes Kepler's Astronomia nova published in 1609 presented a new ephemeris based on planetary orbits later derived analytically in Newton's Principia Mathematica and expounded upon in Leibniz’ Specimen Dynamicum, and LaPlace's Traité de Méchanique Celeste.
观察、假设和证实的新科学方法为当时的自然哲学家提供了新的医学、化学、数学和物理学,赋予了他们的追随者治愈疾病的能力以及天文学和工程学的成就,这些都具有预测能力,吸引了当时许多偏执的欧洲统治者的支持。但理性时代不仅会娱乐和加强君主的统治,还会通过科学启迪整个世界。
The new scientific method of observation, hypothesis, and confirmation, armed the natural philosophers of the day with new medicine, chemistry, mathematics physics, endowing their adherents with powers of healing and feats of astronomy and engineering, that having a predictive capability, attracted the patronage of many of the paranoid European rulers of the time. But the Age of Reason would not only amuse and strengthen the monarchs’ rule, it would enlighten the whole world through science.
科学的预测能力建立在 17 世纪晚期牛顿和莱布尼茨的微积分之上,这种新的数学首先被用于更好地理解自然,然后被用于新结构和新武器,这些结构和武器不仅可以增强国家实力,还可以产生永远改变人类社会的产业。
The predictive power of science was founded on the calculus of Newton and Leibniz in the late 17th Century, a new mathematics that was to be employed first for the greater understanding of Nature and then for new constructs and weapons that would not only advance the power of nations, but also produce industries that would change human society forever.
牛顿和莱布尼茨的微积分可以描述物体的位置x随时间t的变化率,一阶导数速度为v,速度随时间的变化率是位置的二阶导数,加速度为a,因此根据牛顿第二定律F = ma ,对质量为m的物体施加力F将使该物体加速,
The calculus of Newton and Leibniz can describe the rate of change of a body having position x with time t, a first derivative velocity v, and the rate of change of velocity with time that is a second derivative of position, an acceleration a, thence from Newton's second law F = ma, a force F applied to a body with mass m will accelerate that body,
因此,人类可以准确地知道什么力可以产生所需的运动,反过来,什么加速度可以产生什么力,并为此目的设计机器。
Humankind thus could know precisely what force could produce a desired motion, and conversely what acceleration could result in what force, and design machines to those ends.
此外,当力F乘以速度v对时间t进行积分时,可以得出该力在距离s上所做的功W,由此可以知道该净功中有多少将产生多少动能,反之亦然,机器完成该功需要多少动能,
Furthermore, that force F times velocity v, when integrated over time t gives the amount of work W done by that force over distance s, and from that it could be known how much of that net work will generate how much kinetic energy, and conversely how much kinetic energy is required for the machines to do that work,
方程中的“ d ”表示非常小的变化,这是微积分极限的基本概念,例如,从 A 到 B,不断将距离减半,结果会无限接近 B,但永远无法到达,距离会变得无限小。由此,距离、时间、速度、力、功、能量等的变化都可以变得无限小,从而可以对动态系统及其变化进行最精细的表示。
The “d” in the equations denotes a very small change, the fundamental idea of limits in calculus where for example in going from A to B by continually halving the distance results in getting infinitely close to B but never arriving, with the distance becoming infinitesimally small. From this, changes in distance, time, velocity, force, work, energy, and so on all can be made infinitesimally small, allowing representations of dynamic systems and their changes to the finest detail.
对于人工智能,微积分被用来最小化人工神经网络的信念与事实之间的误差,从而使网络能够学习,而微积分的链式法则将过去的变化(一阶导数)结合起来,推动网络提高理解能力。因此,使用微积分的系统可以学习并最终进行预测,因此可以说它具有通过学习进行预测的“智能”能力。
For artificial intelligence, the calculus has been employed to minimize the error between the belief of an artificial neural network and the ground-truth, allowing the network to learn, while the chain rule of calculus joining past changes (first derivatives) pushes the network to improve its understanding. A system employing the calculus thus can learn and ultimately predict, and so can be said to be “intelligent” in its ability to predict through learning.
英国18世纪末的工业革命很大程度上建立在罗伯特·波义尔的蒸汽机的基础上,蒸汽机作为牛顿机器的驱动力,逐渐取代了人力、牛牛和机器。马匹的劳动。法拉第的实验和麦克斯韦的数学发现,旋转的磁场可以在定子中产生电流,反之,定子中的电流可以产生磁场来转动转子,由此产生了发电机和驱动马达可以为机器提供能量和动力来做制造和运输人类的工作的想法;这个想法将迅速改变当今的生活和社会,并最终驱动能够“思考”的机器;即运行大数据的计算机和算法。
The Industrial Revolution of the late 18th Century in England was founded in large part on Robert Boyle's steam engines as the drivers of the Newtonian machines that gradually replaced human, bovine, and equine labor. Faraday's experiments and Maxwell's mathematics found that a turning magnetic field could generate an electric current in a stator and conversely, an electric current in the stator could produce a magnetic field to turn a rotor, resulting in the idea that electricity generators and driving motors could provide the energy and impetus for the machines to do the work of manufacturing and transporting people; this idea would quickly transform the life and society of the day, and ultimately drive machines that could “think”; that is, the computers and algorithms operating on Big Data,.
1882 年,人们开始用煤烧水,产生蒸汽,进而推动电磁铁发电。爱迪生和特斯拉将这一技术进一步发展,他们发明的大型蒸汽涡轮机不仅能发电,驱动工厂机器,还能照亮工厂,让它们在漆黑的夜晚继续工作。电力还照亮了伦敦和纽约的家庭,让人们可以读书、学习和休闲,从而发展出一种城市文化,催生出一个新的工业社会。
Burning coal to boil water to produce steam to turn electromagnets to generate electricity in 1882 was scaled up by Edison and Tesla whose great steam-powered turbines began to produce electricity not only to run machines in factories, but also to light up those factories for work into the dark of night. Electricity as well lit the homes in London and New York for the reading, study, and leisure that would develop an urban culture that would generate a new industrial society.
1908 年,亨利·福特 (Henry Ford) 在流水线上大规模生产用于汽车运输的石油动力内燃机,将世界推向了主导 20 世纪的汽油和塑料原油裂解工业。
In 1908, Henry Ford's assembly-line mass manufacture of petroleum-powered internal combustion engines for automobile transport thrust the world towards the crude oil-cracking industries of gasoline and plastics that would dominate the 20th Century.
然而,流水线让昔日的工匠大师只能从事枯燥乏味的日常工作,即维修现在负责制作产品的机器。这不仅引发了劳工骚乱和随后的社会革命,还引发了可怕的想法:机器也可以是智能的,可以取代人类的“思考”。
The assembly line, however, relegated the erstwhile master craftsman to the mind-numbing routine of servicing the machines that now did the crafting. This led not only to labor unrest and subsequent social revolution, but advanced into the horrifying thought that the machine could be intelligent as well, and take over “thinking” from humans.
另一方面,值得庆幸的是,虽然人类在工作中为机器服务,但机器在家里却为人类服务:冰箱、洗衣机和吸尘器节省了时间和劳力,收音机提供了信息和娱乐,留声机提供了音乐供人们欣赏。后来,计算机将帮助人类工作,奇妙的电子设备提供了迄今为止难以想象的通信和信息;人工智能最终可能会为人类完成几乎所有的“思考”。
On the other hand, the saving grace was that although man serviced the machine at work, the machine would serve man at home: the refrigerator, washing machine, and vacuum cleaner saved time and labor, the radio provided information and entertainment, and the phonograph provided music for enjoyment. Later on, the computer would help humans do their work and marvelous electronic devices provide hitherto unimaginable communications and information; and artificial intelligence ultimately might do almost all the “thinking” for humans.
遥远的日本及时注意到了欧洲工业革命的征兆。19 世纪明治维新后,传统的武士、农民、工匠、商人的社会等级制度被一种以前无法想象的现象颠覆了。一种有头衔的武士,受其精神阶层严格节俭的约束,开始处理世俗的、有时是腐败的民事事务和商业的物质主义。
The portents of the European industrial revolution were duly noted in far-off Japan. After the 19th Century Meiji Restoration, the traditional social hierarchical order of warrior, farmer, artisan, merchant was turned on its head by the previously unthinkable, a titled samurai, honor-bound to the rigid frugality of his spiritual class, began taking up the mundane and at times venal affairs of civil administration and the materialism of commerce.
岩崎弥陀罗是一位受人尊敬的武士的曾孙,他在 1870 年创立了日本第一家企业——三菱重工造船。这家生产敏捷的零式战斗机和巨型大和级战列舰的公司在二战后被美国人解散,这是意料之中的事情。但在朝鲜战争爆发后,为了通过展示资本主义的优点来对抗共产主义的崛起,三菱进行了重组,并更名为一家电器制造商。
Iwasaki Yatoro, the great grandson of a revered samurai, in 1870 founded Japan's first keiretsu, Mitsubishi Heavy Industry and Shipbuilding. The builder of the agile Zero fighter planes and giant Yamato class battleships was understandably disbanded by the Americans after World War II, but upon the Korean War, to counter the rise of communism by demonstrating the virtues of capatilism, Mitsubishi was reorganized and re-branded to become an electric appliance manufacturer.
这些电器的心脏将由电力驱动,但它们的大脑很快就会依靠一种基于量子力学不确定性物理的新电子设备。也就是说,由于电子不能同时拥有绝对可确定的能量和位置,它们可以概率性地穿过半导体材料中表面上不可逾越的势垒,从而允许电流在电子门的控制下流动和放大。
The hearts of those appliances would beat from electricity, but their brains would soon depend on a new electronics based on the physics of quantum mechanical uncertainty. That is, because electrons could not concurrently have an absolutely determinable energy and position, they could probabilistically tunnel through an ostensibly insurmountable potential barrier in a semiconductor material, permitting current flow and amplification under the control of an electronic gate.
从这种主要源自日耳曼的物理学开始,美国贝尔实验室于 1947 年首先发明了晶体管,随后德州仪器和仙童半导体公司于 1958 年独立创建了晶体管集成电路,推动德州仪器和仙童半导体的继任者英特尔在 20 世纪后期占据半导体设备主导地位,而 RCA 和通用电气则在使用这些半导体的消费电子产品生产中占据领先地位,尽管很快被日本索尼和东芝的设计和小型化奇才所超越。
From this mostly Germanic physics, first America's Bell Labs invented the transistor in 1947, and then Texas Instruments and Fairchild Semiconductor in 1958 independently created an integrated circuit of transistors, propelling TI and Fairchild's successor Intel to semiconductor device dominance in the late 20th Century, and RCA and General Electric to the forefront of consumer electronics production using those semiconductors, albeit to be quickly overtaken by the design and miniaturization wizards at Japan's Sony and Toshiba.
在邻国韩国,经历了朝鲜战争的破坏后,在美国的援助和朴正熙总统的专制领导下,中央政府、银行和家族财阀之间的密切关系促成了三星、LG、现代和大宇四大企业在钢铁、造船、消费电子以及半导体制造领域的主导地位。
In nearby South Korea, after the devastation of the Korean War, with American aid and under the autocratic leadership of President Park Jung-hee, the cozy relationship of the central government, the banks, and the family-run chaebols fostered the dominating emergence of the Big Four of Samsung, LG, Hyundai, and Daewoo in steel, shipbuilding, consumer electronics, and finally semiconductor fabrication.
与此同时,在台湾这个小岛上,中华民国充分利用美国的反共援助,开始生产纺织品、塑料和无源电子元件以供出口,备受诟病但物美价廉的“台湾制造”产品大量涌入美国市场,这成为了大型电子产品供应链的前身,台积电专门批量生产半导体芯片,宏碁生产个人电脑,奇美生产液晶显示器,同时降低价格,让全世界的人们都能使用和享受这些新电子产品。
At the same time, on the small island of Taiwan, taking full advantage of counter-communism American aid, the Republic of China began textile, plastics, and passive electronic components manufacturing for export, and the much-maligned but inexpensive and useful “Made in Taiwan” products flooded the US market, the fledgling precursors of a major electronics supply chain specializing in the mass production of semiconductors chips by TSMC, personal computers by Acer, and liquid crystal displays by Chimei, all the while bringing down prices so that people all over the world could utilize and enjoy the new electronics.
日本精湛的产品设计,加上韩国和台湾高效的大规模生产,将高科技产品带入大众,全球化推动了东亚四小龙的经济,但推动了 RCA、Westinghouse、General Electric 和 Telefunken 等先锋公司进入消费产品领域。
Japan's exquisite product design, together with South Korea and Taiwan's efficient mass production, brought high-tech products to the masses, the globalization lifting the tiger economies of East Asia, but driving the pioneering RCA, Westinghouse, General Electric, and Telefunken to consumer product desuetude.
美国很快就凭借德州仪器的便携式计算器、IBM 大型计算机前所未有的计算能力以及苹果和 IBM 的个人计算机实现了复兴,所有这些加上亚洲四小龙的低成本、高效率生产改变了全世界的工作和社会。
America quickly made a comeback with Texas Instruments’ handy pocket calculator, the unprecedented computational power of IBM's mainframe computers, and Apple and IBM's personal computers, all of which together with the Asian tigers’ low-cost, high-efficiency production would change work and society all over the world.
因此, 20世纪见证了高科技消费电子全球化模式的巅峰:欧洲科学、美国发明、日本设计、韩国和台湾生产。与此同时,亚洲沉睡的巨人中华人民共和国陷入了倒退的文化革命,被冷落了。
The 20th Century thus saw the high-technology consumer electronics globalization paradigm at its best: European science, American invention, Japanese design, Korean and Taiwanese production. And all the while, Asia's sleeping giant, the Peoples’ Republic of China, mired in a regressive cultural revolution, was left out in the cold.
20 世纪后期,美国的研究型大学和东部 128 号公路、西部硅谷的创新精神吸引了来自美国各地,乃至全世界,特别是印度、中国、俄罗斯和中东的工程师和企业家。雅虎、谷歌、Facebook 和亚马逊等新兴信息技术公司迅速崛起,占据了互联网商业主导地位,个人电脑和智能手机通信开始产生大数据,推动了美国现代人工智能的崛起,并最终唤醒了沉睡的中国巨龙,新兴人工智能密集型科技公司阿里巴巴、百度和腾讯引领了这一潮流。
Later in the Century, American research universities and the innovative spirit of Route 128 in the east and Silicon Valley in the west attracted engineers and entrepreneurs from all over America, and then the world, particularly India, China, Russia, and the Middle East. The new information technology companies Yahoo, Google, Facebook, and Amazon quickly rose to Internet commercial dominance, and the personal computer and smartphone communications began to generate the Big Data that empowered the rise of modern artificial intelligence in America, and finally awoke the sleeping dragon China with the new AI-intensive tech companies Alibaba, Baidu, and Tencent leading the way.
20世纪末期,工业和社会发生了重大变革,而本世纪中叶,人工神经网络的实施将占据主导地位,这种网络由巧妙的新算法驱动,可以解析智能手机中体积不断缩小但功能越来越强大的计算机所产生的不断增长的大数据。
The seminal changes in industry and society of the late 20th Century were propelled by the computer; the middle of this Century will be dominated by the implementation of artificial neural networks driven by clever new algorithms parsing the ever-growing Big Data derived from the ever-shrinking but more powerful computers in the smart phone.
由于人工神经网络完全依赖于计算机的构造和计算,因此人工智能依赖于计算机,就像人类智能依赖于人脑一样。
As artificial neural networks are totally dependent on the constructs and calculations performed by computers, artificial intelligence depends on the computer just as human intelligence depends on the human brain.
计算机处理可以说始于几个世纪前,首先是用于模拟机械计算的对数思想,然后是用于半导体逻辑的二进制数和布尔逻辑,最后是用于快速大规模计算的集成电路开关电子设备。
Computer processing can be said to have started centuries ago, first with the idea of logarithms for analog mechanical computation, then binary numbers and Boolean logic for semiconductor logic, and finally integrated circuit switching electronics for fast and massive computing.
一个显而易见的事实是,当存在较大的差异因素时,大脑对比例更敏感,因此对数差异数量级(10 倍)的差异比厘米级的单位差异更容易被感知,因此,模拟大脑的计算机的基本操作自然而然地将基于对数尺度,这具有能够方便地执行乘法和除法并增加计算范围的进一步优势。
It is an obvious and proven fact that the brain is more sensitive to proportion when there is a large difference factor, so logarithmic difference of orders of magnitude (10x) is more easily perceived than say the unit differences of a centimeter scale, so it seems natural that the fundamental operations of a brain-mimicking computer would be based on logarithmic scales that have the further advantages of being able to conveniently perform multiplication and division and increase the range of computation.
现代使用的对数概念是由苏格兰的约翰·纳皮尔在 17 世纪初为辅助数学计算而提出的。一个数的对数是另一个数(称为底数)的指数,必须将该指数提升才能得到原始数,因此实际上所有数都可以用指数表示(例如,以 10 为底的对数正好是 1.0 (log 10 ( 10) = 1),因此非常大的数字可以更紧凑地表示为以 10 为底的指数(例如,1000000 = 10 6,(只需计算零的数量);此外,乘法和除法可以通过加减对数来完成,这导致了非常方便的模拟计算,首先通过机械计算器,然后是工程师的计算尺,最终通过计算机进行数字计算。
The modern-use idea of logarithms was formulated in the early 17th Century by Scotland's John Napier as an aid for doing mathematical calculations. The logarithm of a number is the exponent to which another number, called the base, must be raised to produce that original number, thus practically allowing all numbers to be represented by exponents (for example, the log of 10 to the base 10 is just 1.0 (log10(10) = 1) and so very large numbers can be represented more compactly as exponents to the base 10 (for example, 1000000 = 106, (just count the number of zeros); furthermore, multiplication and division can be performed by adding and subtracting logarithms, leading to very convenient analog computation first by mechanical calculators, then the engineer's slide rule, and eventually digital computation by computers.
17世纪科学革命的两位杰出人物——法国数学家布莱斯·帕斯卡和德国数学家戈特弗里德·威廉·莱布尼茨设计了手摇齿轮加法机,它还可以进行对数乘法和除法,其机械原理推动了未来 300 年计算器的发展。
Two luminaries of the 17th Century Scientific Revolution, the French mathematician Blaise Pascal and the German Gottfried Wilhelm Leibniz designed hand-cranked cogwheel adding machines that could also logarithmically multiply and divide, the mechanical principles of which would drive calculators for the next 300 years.
这些计算机器后来由约瑟夫·玛丽·雅卡尔的自动织布机的原理控制,该织布机使用由可移动杆读取的一副打孔卡来交织不同颜色的线以制造布料。不同的卡片序列可以产生不同的设计,因此雅卡尔织布机早在 1804 年就构成了第一台可编程的织布机器人,该机器人很快就被提升到科学和工程工作。
These calculating machines would later be controlled by the principles of Joseph Marie Jacquard's automatic weaving loom that used a deck of punch-hole cards read by movable rods to interweave different color threads for cloth manufacture. Different sequences of cards could produce different designs, the Jacquard loom thus constituting as early as 1804 the first programmable weaving robot, who would soon be promoted to scientific and engineering work.
1822 年,英国的查尔斯·巴贝奇设计了一种计算机,其齿轮和轮子上刻有数字,通过转动,可以逐步迭代独立变量导数的微小差异,从而为工程表产生微分方程的解。1834 年,他扩大了差分机的范围,制造出一种火车头大小的蒸汽驱动机械计算器,其“存储器”可以容纳刻在这些齿轮和轮子上的 100 个 40 位数字,还有一个“磨机”,可以获取数字并执行迭代(do 循环)、条件(if-then)和传输(go-to)等计算,需要许多复杂的机械相互作用。这是计算机的起源,它可以进行微积分,编程,处理大量数据,是人工智能机器的命脉。
In 1822, England's Charles Babbage designed a calculating machine with numbers etched on interacting cogs and wheels that by cranking could step-by-step iterate the small differences in derivatives with respect to an independent variable to produce solutions of differential equations for engineering tables. In 1834, he expanded the scope of his Difference Engine with a locomotive-sized, steam-powered mechanical calculator having a “store” that could hold one hundred 40-digit numbers etched on those cogs and wheels, and a “mill” that could fetch the numbers and perform calculations such as iterations (do-loops), conditionals (if-then), and transfers (go-to), requiring many, many complex mechanical interactions. This was the beginning of computers that could do calculus, be programmed, and handle large amounts of data, the life-blood of artificial intelligence machines.
正是巴贝奇的助手,伟大诗人拜伦勋爵的可爱女儿,利用雅卡尔的穿孔卡设计出上述逻辑运算的第一个有序序列,用于巴贝奇的分析机的计算。1843 年,自学成才的数学家洛夫莱斯伯爵夫人艾达成为了历史上第一位计算机程序员。1
It was Babbage's assistant, the lovely daughter of the great poet Lord Byron, who, using Jacquard's punch cards devised the first ordered sequences of the above logical operations for the calculations of Babbage's Analytical Engine. The self-taught mathematician Ada, Countess of Lovelace in 1843 thus was history's first computer programmer.1
伯爵夫人为控制巴贝奇的分析机而设计的穿孔卡片程序可以解决不同的微分方程问题,就像雅卡尔织布机织出的布图案可以通过改变穿孔卡片的顺序来改变一样,正如伯爵夫人在她的操作说明中所写,2
The Countess’ punch-card programs designed to control Babbage's Analytical Engine could solve different differential equation problems just as the woven cloth designs produced by Jacquard's loom could be changed simply by changing the order of the punch-cards in the stack, so as the Countess wrote in her operational instructions,2
我们可以恰当地说,分析机编织代数图案就像提花织机编织花朵和树叶一样
We may say most aptly that the Analytical Engine weaves algebraical patterns just as the Jacquard loom weaves flowers and leaves
可惜的是,由于缺乏资金,巴贝奇的分析机从未制造出来,因此人类不得不等待一百多年才等到万尼瓦尔·布什的模拟微分分析仪,该分析仪仍然需要齿轮,齿轮的轴旋转记录小的差异迭代以求解微分方程,而这些方程仍然需要用螺丝刀和扳手手工费力地设置。
Alas, Babbage's analytical engine was never built because of lack of funding, mankind thus had to wait more than one hundred years for Vannevar Bush's analog Differential Analyzer that still required cogwheels which shaft rotations recorded small difference iterations for solving differential equations that were still set up laboriously by hand using screwdrivers and wrenches.
1938 年,布什的曲柄机械轴旋转被李·德·福雷斯特的真空管三极管放大器和乔治·菲尔布里克的电压所取代,从而提高了计算速度并大大缩短了设置时间。结果显示在示波器上,标志着电子科学和工程计算图形表示的到来。
Bush's cranked mechanical shaft rotations were gratefully replaced by Lee De Forest's vacuum-tube triode amplifiers and George Philbrick's electric voltages in 1938, thereby increasing computation speed and greatly reducing set-up time. The results were displayed on an oscilloscope, signaling the arrival of the graphical representation of electronic scientific and engineering calculations.
布什的差分分析仪被用于战争,但它需要十种不同的逻辑状态来表示 0 到 9 的小数,并且依赖于真空管通常不稳定且嘈杂的模拟电压来执行其操作。
Bush's Differential Analyzer was employed in the War effort, but it required ten different logical states to represent the decimals 0 to 9 and depended on the often unstable and noisy analog voltages of vacuum tubes to perform its operations.
当今数字计算机的概念突破记录在他 1847 年出版的《逻辑的数学分析》一书中,书中乔治·布尔重新审视了莱布尼茨对中国古代《易经》占卜的研究,即天下万物都是二元的,例如黑暗与光明、男性与女性、上与下、左与右、善与恶等等。布尔效仿纳皮尔,制定了一个以 2 为底的对数系统,仅凭这两种状态,就为数字计算的二进制逻辑奠定了基础,并用同名的布尔代数取代了十进制计算。
The conceptual breakthrough to today's digital computers was recorded in his 1847 book, The Mathematical Analysis of Logic, wherein George Boole revisited Leibniz’ study of ancient China's I Ching divination that all under Heaven are dualities, for instance dark and light, male and female, up and down, left and right, good and evil, and so on. Following Napier, Boole formulated a base 2 logarithmic system, and with only those two states, laid the foundations for the binary logic of digital computing to replace decimal calculation with the eponymous Boolean Algebra.
与十进制数中小数点左边每位以10的幂增加不同,二进制系统中左边每位代表2的幂增加,“1”表示莱布尼茨极具宗教意义的活跃“上帝”存在,“0”表示《易经》神秘主义中不活跃的“空”,因此从右向左读,例如 8 用二进制序列 1000 表示,从右向左读,只有第四位是“开”,所以 2 3 = 8。
In contrast to base 10 decimal numbers where each position to the left of the decimal point increases by a power of 10, each place to the left in a base 2 binary system represents an increasing power of 2, with “1” signifying the very religious Leibniz’ active “God” existence and “0” signifying an inactive “Void” in I Ching mysticism, so read from right to left, for example 8 is represented by the binary sequence 1000, where read from right to left, only the fourth place is “on”, so 23 = 8.
AND、OR、NOT 等基本二进制逻辑运算组合起来可以表达所有算术运算,NOR、NAND、XOR 等运算可以使运算更加简单。这些逻辑门阵列可以进行加、减、乘、除、比较、分类等所有基本的数学和逻辑运算。
The basic binary logical operations of AND, OR, and NOT in combination can express all arithmetic operations, and the NOR, NAND, XOR operations can make operations simpler. Arrays of these logic gates can add, subtract, multiply, divide, compare, classify, and perform all the basic mathematical and logical operations.
逻辑门通过逻辑门级联接收高或低编码脉冲,产生高电压和低电压,允许电流根据布尔代数的真值表流动或不流动,从而产生编程计算机计算所需的逻辑结果。
Logic gates receive high or low coded pulses through the logic gate cascades to produce high and low voltages allowing current to flow or not flow according to the truth tables of Boolean algebra to produce desired logical outcomes for programming computer computations.
实现上的突破来自万尼瓦尔·布什的研究生克劳德·香农。他指出,由于任何数字的二进制表示都可以是二进制位串,因此电子开关的两态开/关逻辑可以轻松表示所有数字,例如数字 8(二进制 1000)可以通过二进制开关轻松设置为(从右到左)、关、关、关、开,并且位的组合可以形成信息字节,因此作为门的开关组合可以控制由真值表表示的逻辑运算,真值表要么为 True(开、开、高)要么为 False(关、关、低),从而产生所需的逻辑级联,代表计算结果。
The implementation breakthrough came from Vannevar Bush's graduate student Claude Shannon. He noted that since the binary representation of any number can be strings of binary bits, the two-state on/off logic of electronic switches could easily represent all the numbers, for example the number 8 (binary 1000) can be easily set by binary switches as (right to left), off, off, off, on and combinations of bits could form bytes of information, so that combinations of switches as gates could control logical operations represented by truth tables of either True (on, open, high) or False (off, closed, low) that led to a desired logical cascade that represented the results of the computation.
因此,数学计算可以仅通过简单的电子开关非常高效地完成,正如香农在 1938 年麻省理工学院硕士论文《继电器和开关电路的符号分析》中概述的那样,这证明研究生论文毕竟还是有一定价值的。
Mathematical calculations thus could be very efficiently performed solely by simple electronic switches, as Shannon outlined in his 1938 MIT master's thesis entitled A Symbolic Analysis of Relay and Switching Circuits, proving that it is after all possible for a graduate student thesis to have some value.
模拟信号(例如语音和音乐)通过按顺序对电压信号进行小间隔采样而被数字化,每个信号都有一个十进制值,该十进制值由模数转换器 (ADC) 转换为二进制形式,然后由 DAC 解码回模拟电压进行播放,由于每个声音都有数字分配,因此不会受到噪音或失真的影响;这就是为什么数字 CD 上的音乐听起来比模拟磁带上的音乐更好。
Analog signals, such as voice and music, are digitized by sequentially sampling a small interval of the voltage signal, giving each a decimal value which is translated to binary form by an analog-to-digital-converter (ADC), and decoded back to analog voltages by a DAC for playback which, because of the number assignments of each sound, is not subject to noise or distortion; that is why the music on digital CDs sounds better than that of analog tapes.
计算硬件的理论突破源于艾伦·图灵 1936 年在剑桥大学发表的论文《论可计算数》 ,该论文定义了二进制逻辑运算,这些运算可以按顺序记录并存储在理论上无限长的纸带上,从而构成了他所谓的通用机器,这种机器原则上可以计算任何可解的问题,因此原则上也可以以电子方式“思考”,这是机器生成的人工智能的第一个实例。
The theoretical breakthrough for computing hardware was set forth in Alan Turing's 1936 Cambridge University research paper, On Computable Numbers, which defined the binary logical operations that could be sequentially recorded and stored on a theoretically infinitely-long paper tape to constitute what he called a universal machine that could in principle compute any solvable problem, and therefore in principle also could actually “think” electronically and thus was the first instance of a machine-generated artificial intelligence.
为了证明这一情报,图灵和英国情报局布莱切利园的“幕后小子”及其 2,000 个真空管巨像在二战期间破译了纳粹密码Enigma。截获的德国信息被编码并打入纸带,然后送入光电阅读器,以当时惊人的每秒 5,000 个字符的速度反复扫描,然后将其与波兰抵抗军缴获的 Enigma 密码进行比较以找到匹配项。尽管德国人通过系统地旋转字母数字转子并每天三次更改编码机的插头设置和按键(德国接收者会知道这些设置)来扰乱信息,但“幕后小子”还是破译了密码,正如其中一名小子所说。3
As proof of this intelligence, the deciphering of the Nazi secret code Enigma during World War II by Turing and the “Backroom Boys” at British Intelligence's Bletchley Park and their 2,000 vacuum-tube Colossus. Intercepted German messages, encoded and punched into paper tape which was fed to a photoelectric reader that iteratively scanned at a then astonishing rate of 5,000 characters per second, it was then compared with the Enigma codes captured by the Polish resistance to find a match. Even though the Germans scrambled messages by systematically rotating alphanumeric rotors and changing the plug settings and keys of the encoding machine three times a day (the German receiver would know the settings), the “Backroom Boys” broke the code, and as one of the Boys said.3
我不会说图灵的所作所为让我们赢得了战争,但我敢说,如果没有他,我们可能会输掉战争。
I won’t say what Turing did made us win the war, but I daresay we might have lost it without him.
1942 年,约翰·阿塔纳索夫在爱荷华州立大学建造了一台在电子电路上运行布尔代数的图灵机。该原型的真空管逻辑和电容存储器后来被引用为现有技术,并于 1974 年使 ENIAC 计算机专利无效,从而合法宣称它是世界上第一台数字电子计算机。4
A Turing Machine running Boolean algebra on electronic circuits was constructed in 1942 at Iowa State College by John Atanasoff. The prototype's vacuum-tube logic and capacitor memory would later be cited as the prior art that in 1974 invalidated the ENIAC computer patent, and so laid legal claim to be the world's first digital electronic computer.4
1890 年,雅卡尔的 1804 年织布机和艾达伯爵夫人的穿孔卡片再次出现在制表机中。制表机通过将穿孔卡片拖到金属刷下方和导电水银槽上方来读取穿孔卡片上的孔,这样当刷子穿透孔时,刷子和水银之间的电路就会闭合,从而产生信号以添加到计数器中。美国人口普查局使用制表机对人口普查数据进行制表,这项工作仅花费了上一次人口普查时间的三分之一,对人口进行了计数、分类和统计分析,当时人口增加了 1300 万,达到近 6300 万。
Jacquard's 1804 weaving loom and Countess Ada's punch-cards would reappear in 1890 in a tabulator that read the holes in the punch-cards by trailing them under metal brushes and over a bath of electrically conductive mercury so that when the brushes penetrated a hole, a circuit between the brushes and the mercury was closed producing a signal to add to a counter. Used for tabulating census data for the United States Census Bureau, the effort took only one-third the time of the last census to count, sort, and statistically analyze a population that had increased by 13 million to almost 63 million citizens.
发明者赫尔曼·霍勒里斯 (Herman Hollerith) 后来创办了一家机器制表公司,多年后,该公司在托马斯·沃森 (Thomas Watson Sr.) 的领导下发展成为巨头 IBM,生产大型计算机,使用同名的霍勒里斯打孔卡输入数据和编程指令,用于批处理计算机处理。
The inventor, Herman Hollerith, would later start a machine tabulating business that many years later would grow under Thomas Watson Sr. to become the giant IBM that produced mainframe computers using the eponymous Hollerith punch cards to enter data and programming instructions for batch-mode computer processing.
尽管 Watson Sr. 是一位追求利润最大化的企业家,但他也未能免受政府呼吁协助战争的诱惑,正如他所说,“迫不得已”,他公开发誓 IBM 永远不会从政府工作中赚取超过 1% 的利润。Watson Sr. 随后为陆军的弹道表计算捐赠了打孔卡制表机,而且至关重要的是,当物理学家 Hans Bethe 的核裂变炸弹设计方程式在洛斯阿拉莫斯无法解决时,IBM 为绝密的曼哈顿计划原子弹开发提供了计算机。
Despite being a consummate profit–maximizing businessman, Watson Sr. was not immune to government calls to assist in the War effort and, as he put it, “making a virtue of necessity”, he publicly vowed that IBM would never make more than a 1% profit from its government work. Watson Sr. then donated punch-card tabulators for the Army's ballistic table calculations, and critically, when the physicist Hans Bethe's equations for nuclear fission bomb designs could not be solved at Los Alamos, IBM provided the computing machines for the top-secret Manhattan Project development of the Atomic Bomb.
随着计算机的日益复杂化,IBM 同意与哈佛大学的霍华德·艾肯 (Howard Aiken) 进行合资,后者设计了一种全自动科学计算机,能够处理正数和负数并以自然的数学顺序进行计算,并且可以利用各种数学函数,例如三角学的正弦和余弦。
With the ever-increasing sophistication of computing machines, IBM agreed to a joint venture with Harvard University's Howard Aiken who had conceived a design for a scientific computer that is fully automatic, capable of handling positive and negative number and carrying out calculations in a natural mathematic sequence, and it could utilize a variety of mathematical functions, such as the sine and cosines of trigonometry.
哈佛 Mark I 是一款十进制编码的 50 英尺长庞然大物,拥有 3,304 个继电器、500 英里长的电线和 750,000 个机电开关,它们发出的声音就像“一屋子的老太太用钢针编织东西”一样,同时它可以处理长达 23 位的数字,在一秒钟内将三个 8 位数字相加,在 3/10 秒内减去,在 3 秒内乘以。它使用从连续的纸带打孔而不是单独的打孔卡中读取数据的进程,可以在一天内完成以前需要几个月才能完成的计算。
The Harvard Mark I was a decimal-coded, 50-foot long monster with 3,304 relays, 500 miles of wire, and 750,000 electromechanical switches which clattered like “a roomful of old ladies knitting away with steel needles” while it crunched numbers up to 23 digits long, added three 8-digit numbers in a second, subtracted in 3/10 of a second and multiplied in 3 seconds. Using the progression of reading data from continuous paper tape punch holes instead of separate punch-cards, it could perform calculations in a single day that formerly took months.
它的第一项任务是为海军计算弹道,由于舰载炮在不稳定的海面上摇摆,本来就非常困难的问题加上海风对炮弹的冲击,几乎无法解决。
Its first job was to calculate ballistic trajectories for the Navy, and what with the swaying of ship-borne guns on an unstable sea, an extremely difficult problem to begin with was made almost intractable by the sea winds buffeting the shells.
当计算机运往海军使用时,老练的销售员 Watson Sr. 看到了提升 IBM 形象的机会,他为 Mark I 设计了光滑闪亮的钢和玻璃外壳,这与 Aiken 的计划相反,Aiken 计划采用开放式框架,露出内部运作,以便于监控和调整。
When shipped to the Navy for operations, Watson Sr. the inveterate salesman saw an opportunity to enhance IBM's image with a sleek, gleaming steel and glass casing for the Mark I, contrary to Aiken's plan for an open frame exposing the workings for easier monitoring and adjustments.
1944 年,在向媒体介绍新机器时,艾肯几乎没有提到 IBM,而宣布 Mark I 的姓氏为“哈佛”,这进一步激怒了沃森,认为这削弱了 IBM 在机器开发中的作用,并使公司失去了备受追捧的宣传。哈佛 Mark I诞生于充满敌意的环境中,尽管如此,它还是为海军做出了令人钦佩的表现,但私人恩怨却伴随两位助产士直到他们死去。
In the introduction of the new machine to the press in 1944, Aiken barely mentioned IBM, and the announced “Harvard” surname for the Mark I further rankled Watson as diminishing IBM's role in the development of the machine and depriving the company of much sought-after publicity. The Harvard Mark I, born in acrimony, nonetheless performed admirably for the Navy, but personal animosity followed both of the midwives to their graves.
与此同时,在陆战中,在北非海岸对隆美尔非洲军团开火的火炮被后坐力推入松软的沙地,导致瞄准不准。迫切需要新的射击表。陆军部位于阿伯丁试验场的弹道研究实验室发现,修改计算超出了他们的差分分析仪的能力,因此在宾夕法尼亚大学设立了一个分支机构,陆军向摩尔电气工程学院拨款 40 万美元,用于进行新的射击表计算。
Meanwhile in the land war, artillery pieces firing on the North African shore against Rommel's Afrikakorps were recoiling into the soft sand throwing off their aim. New firing tables were urgently needed. The War Department's Ballistic Research Laboratory at the Aberdeen Proving Grounds found revising the calculations beyond their differential analyzers’ capabilities, so a branch was set up at the University of Pennsylvania, and the Army awarded the Moore School of Electrical Engineering $400,000 to do the new firing tables calculations.
该项目负责人约翰·莫奇利 (John Mauchly) 曾到爱荷华州立大学拜访了约翰·阿塔纳索夫,观看他的原型计算机。在发明构思的迷雾中,他和普雷斯珀·埃克特 (Presper Eckert) 设计了一台 80 英尺长、30 吨重的电子数字积分计算机(ENIAC)。和阿塔纳索夫一样,它使用电容器作为内存,使用真空管作为逻辑,其加法速度比哈佛 Mark I 快一千倍。
The director of the effort, John Mauchly, had visited John Atanasoff at Iowa State to see his prototype computer, and in the fog of invention conception, he and Presper Eckert designed an 80-feet long, 30-ton Electronic Numerical Integrator and Computer (ENIAC), employing, like Atanasoff, capacitors for memory and vacuum tubes for logic that could add a thousand times faster than the Harvard Mark I.
然而,当工作最终完成时,ENIAC 的揭幕发现一个国家不再处于战争状态,从而消除了对软地形火炮射击表的迫切需要,但它仍然可以在 20 秒内计算出新的、更精确的射击表,这比炮弹到达目标的时间还要短。
However, when work was finally completed, ENIAC's unveiling found a country no longer at war, obviating the immediate need for soft-terrain artillery firing tables, but it nonetheless could compute new, more accurate firing tables in 20 seconds, less than the time for the shell to reach target.
一场大型试验很快就取代了发射台,为冷战初期研制氢弹需要进行大量受控聚变反应的计算,由于 ENIAC 真空管的切换速度比由于采用了哈佛 Mark I 的机电开关,它被要求在冷战而非热战中为国家服务。
A big-time test soon took the place of the firing tables, the development of the Hydrogen Bomb for the incipient Cold War required many, many calculations of controlled fusion reactions, and since ENIAC's vacuum tubes could switch a thousand times faster than the electromechanical switches of the Harvard Mark I, it was called upon to serve its country in a cold rather than a hot war.
爱德华·泰勒 (Edward Teller) 的氢弹设计和斯坦尼斯拉夫·乌拉姆 (Stanislaw Ulam) 的蒙特卡罗计算都是正确的,埃尼威托克环礁的埃卢盖拉布岛被摧毁,而 ENIAC 则在 1952 年珊瑚礁蒸发的瘴气中得以圣化。ENIAC 虽然在超级炸弹爆炸前就已经过时了,但它是第一台大型“通用”计算机(从炮弹发射台到核聚变炸弹),但不是第一个,当然也不是最后一个为毁灭而生、最终造福人类的技术。
The H-Bomb designs of Edward Teller and the Monte Carlo calculations of Stanislaw Ulam were spot-on, Elugelab island in the Enewetak Atoll was obliterated, and ENIAC was sanctified in the miasma of vaporized coral reefs in 1952. ENIAC although obsolete well before The Super detonated, it was the first mainframe “general-purpose” computer (artillery shell firing tables to nuclear fusion bombs), but not the first, and surely not the last technology, that was created to destroy, but ultimately served the good of humankind.
大型计算机现在被视为无害的,但具有令人敬畏的冷酷能力,以令人不安的超然态度运行,正如一位记者在 1948 年目睹 IBM 的选择序列电子计算器(SSEC) 在纽约市 IBM 总部计算高精度月球星历表时所描述的那样:5
The mainframe computer was now seen as benign but with a cool reverence-inducing capability operating with unnerving detachment, as described by a reporter witnessing IBM's Selective Sequence Electronic Calculator (SSEC) calculating a high-precision lunar ephemeris at IBM Headquarters in New York City in 1948:5
打印机轻声点击,打孔卡平稳移动,存储带滚筒偶尔转动,数字显示管在眨眼间闪烁,小红灯不断闪烁。其他一切都很安静,甚至操作员在这个流线型的避难所里说话也很轻声。
There is the quiet clicking of printers, the steady shuffling of punched cards, the occasional rotation of a drum with memory tape, and a continual dance of little red lights as number-indicating tubes flick on and off in far less time that the twinkling of an eye. All else is hushed, and even the operators speak quietly in this streamlined sanctuary.
SSEC 是一系列首字母缩写计算机之一,这些计算机包括 ILLIAC、IAS、MANIAC、ENIAC、EDVAC、EDSAC、UNIVAC 和 BINAC,它们都为自动计算技术的长足发展做出了贡献,也都经历过鼎盛时期,但都被新设计和更好的技术所超越,很快便走向了衰落。从这个意义上说,老沃森不必为 IBM 因 Mark I 而失去宣传而感到惋惜,因为在 ENIAC 的快速真空管开关面前,Mark I 的粗糙机电开关成为了落后的象征。
The SSEC was one of a succession of acronymic computers, ILLIAC, IAS, MANIAC, ENIAC, EDVAC, EDSAC, UNIVAC, and BINAC, each contributing to the long march of automatic computing technology and each experiencing a heyday, but all quickly lapsing to desuetude after being outperformed by new designs and better technology. In this sense, Watson Sr. need not have lamented IBM's lost publicity for the Mark I whose crude electromechanical switches became a symbol of backwardness in light of the ENIAC's fast vacuum tube switches.
尽管这是一项开创性的进步,但 ENIAC 的缺点是它是一种以 10 为基数的十进制架构,需要 17,468 个真空管,每秒处理 100,000 个脉冲,因此每秒有超过 17 亿次真空管故障的几率。因此,ENIAC 必然以低速运行电压和大风扇进行风冷,将故障率降低到每周一到两次,但大规模计算通常需要顺序操作,因此,一旦令人沮丧地发现排列在数千个插件模块中的烧坏真空管,本已乏味的搜寻就意味着从头开始重新进行中断的计算。
Although a seminal advance, ENIAC's shortcoming was that it was a base 10 decimal architecture, requiring 17,468 vacuum tubes that handled 100,000 pulses per second such that there were more than 1.7 billion chances of a tube failure every second. So ENIAC of necessity ran at low voltages with great fans for air-cooling, reducing failures to one or two per week, but large-scale computing generally requires sequential operations, so that the already tedious searches for the burnt-out vacuum tubes that were arrayed in thousands of plug-in modules, once found dismayingly meant re-starting the interrupted calculations from scratch.
此外,ENIAC 的“通用”称谓具有误导性,因为它的内部存储只能容纳手头计算所需的数字。这意味着必须通过为每个不同的计算任务手动插拔数百条电线来连接电路中的特定操作,这是一个繁琐、令人厌烦且容易出错的过程,通常需要几天才能完成。
Furthermore, ENIAC's “general-purpose” appellative was misleading, for its internal storage could only hold the numbers it needed for the calculations at hand. This meant that particular operations had to be connected within the circuitry by hand-plugging and unplugging hundreds of wires for each different computing task, a tedious, tiresome, and error-prone procedure that often took days to complete.
ENIAC 的后继者电子离散变量计算机(EDVAC) 采用 (离散) 二进制逻辑而非十进制逻辑运行,减少了真空管的数量,并采用了高等研究院 (IAS) 计算机使用的冯·诺依曼架构,该架构将程序存储在扩展的电子存储器中,通过自动提取取代了 ENIAC 原始的手动插入。
ENIAC's successor, the Electronic Discrete Variable Computer (EDVAC) operated in (discrete) binary logic rather than decimal, reducing the number of vacuum tubes, and adopted the von Neumann architecture used at the Institute for Advanced Studies (IAS) computer that stored programs in an expanded electronic memory with automatic fetching replacing ENIAC's primitive hand-plugging.
1945 年,和蔼可亲、享乐主义的科技领域大师约翰·冯·诺依曼在他著名的《EDVAC 报告初稿》中,将计算机操作分为由中央算术逻辑单元 (ALU) 和状态寄存器组成的处理器、带有指令寄存器和程序计数器的中央控制单元、存储指令和数据的内存、外部大容量存储单元以及输入和输出寄存器,共同定义了现代中央处理器内存提取数字计算机的串行架构,这种架构至今仍在大多数计算机中使用。然而,在今天的人工智能计算机中,冯·诺依曼架构的串行处理将被更快的大规模并行处理架构所取代,而这种架构是人工智能计算所需的大数据矩阵运算所必需的。6
In 1945, the amiable, bon vivant master of all technical disciplines, John von Neumann, in his famous First Draft on the Report of the EDVAC had divided computer operations into a processor that comprises a central arithmetic logic unit (ALU) and state registers, a central control unit with an instruction register and program counter, memory that stores instructions and data, external mass storage units, and input and output registers, altogether defining a serial architecture for the modern central processing unit memory-fetch digital computer that is still in use today in most computers. However, in today's artificial intelligence computers, the von Neuman architecture's serial processing would be replaced by the much faster massively-parallel processing architecture required for the operation of the matrices of Big Data required for artificial intelligence computation.6
其中一台冯·诺依曼机器是剑桥大学的电子延迟存储自动计算器(EDSAC),具有讽刺意味的是,该机器比 EDVAC 早两年于 1949 年完成,因为摩尔学院内部发生动乱,因为莫奇利和埃克特认为冯·诺依曼在他们开发 EDVAC 方面获得了过高的赞誉。
One of these von Neumann machines was Cambridge University's Electronic Delay Storage Automatic Calculator (EDSAC) that was ironically completed two years before the EDVAC in 1949 because of turmoil at the Moore School over Mauchly and Eckert's belief that von Neumann was overly credited for their development of EDVAC.
为了抚平受伤的感情,莫奇利和埃克特想着发财致富,于是他们离开了宾夕法尼亚大学,因为该校的政策是员工不应从大学进行的研究中获得经济利益。他们成立了自己的公司,生产出真正通用的通用自动计算机(UNIVAC),使用磁带进行高速编程和数据输入。
With thoughts of striking it rich to assuage hurt feelings, Mauchly and Eckert left the University of Pennsylvania because of its policy that employees should not benefit financially from research performed at the University. They formed their own company to produce a truly general-purpose Universal Automatic Computer (UNIVAC) using magnetic tape for high-speed programming and data input.
设计虽然不错,但由于缺乏商业头脑,他们财务困难的公司于 1950 年被卖给了雷明顿兰德公司。莫奇利和埃克特从销售和专利使用费中各自只赚取了约 30 万美元。
The design was sound, but lacking business acumen, their financially distressed company was sold to Remington Rand in 1950. Mauchly and Eckert each made only about $300,000 from the sale and patent royalties.
在雷明顿·兰德公司的专业管理下,UNIVAC 作为第一台真正意义上的大型通用计算机,取得了商业上的成功。总共有 46 台机器出售给政府和工业界,其中一台预测了 1952 年美国总统大选的获胜者,这一壮举创造了一种机器智能的光环,令公众惊叹不已,与深奥的 ENIAC 蒙特卡罗模拟形成了鲜明对比,后者依法被隐藏在氢弹研发的秘密中。
Under the professional management of Remington Rand, UNIVAC was a commercial success as the first large-scale truly general-purpose computer. A total of 46 machines were sold to government and industry, one of which predicted the winner of the 1952 American presidential election, a feat that created an aura of machine intelligence that could awe the public, in contrast to the esoteric ENIAC Monte Carlo simulations that were by law veiled in the secrecy surrounding the development of the Hydrogen Bomb.
UNIVAC 在商用大型计算机销售方面的领先地位显然激怒了沃森老先生。他命令 IBM 加快开发 IBM 进入通用计算机市场的产品,而 Arthur Samuel 正在改装 IBM 701 来下跳棋。701 后来于 1952 年取代了 IBM 纽约展厅中的 SSEC。
UNIVAC's lead in commercial mainframe computer sales clearly irked Watson Sr. He ordered IBM to accelerate the development of IBM's entry into the general-purpose computer market, just the IBM 701 that Arthur Samuel was adapting to play checkers. The 701 would later replace the SSEC in IBM's New York showroom in 1952.
对于计算机内存,曼彻斯特大学的弗雷德·威廉姆斯 (Fred Williams) 开发了一种阴极射线管,该管将二进制点和划线发射到荧光屏上,然后由扫描电子束读取,该电子束在集电板上存储独特的电流,以表示内存中可以随机访问的数据。这种RAM显示器允许机器根据命令获取程序和数据,而沃森老先生的 IBM 701 立即使用威廉姆斯管在雷明顿兰德的 UNIVAC 上占据优势。
For computer memory, Fred Williams at Manchester University developed a cathode ray tube, which beam painted binary dots and dashes on a phosphor screen that was read by a scanning electron beam that stored a distinctive current on a collector plate to represent data in memory that could be randomly accessed. This RAM display allowed a machine to on-command fetch programs and data, and Watson Sr.'s IBM 701 immediately used the Williams Tube to gain an edge on Remington Rand's UNIVAC.
对于大量数据,王安的脉冲传输控制装置通常被认为是磁芯存储器的原型,其中铁氧体磁芯网络存储由相反圆形方向的同轴电流产生的磁化循环方向指定的二进制位,为后来的 IBM 704 和 705 计算机提供了海量磁盘数据存储。
For large amounts of data, An Wang's pulse-transfer controlling device is generally cited as the prototype of the magnetic core memory, whereby networks of ferrite cores stored binary bits specified by the circulation direction of magnetization produced by coaxial currents of opposite circular direction that provided the massive disk data storage for the later IBM 704 and 705 computers.
当时对商业不感兴趣的哈佛大学允许王安获得个人专利权,他利用这些专利权于 1951 年建立了同名的王安实验室,率先研制出最早的台式计算器、文字处理器以及科学和商用小型计算机。
A then commercially disinterested Harvard University allowed Wang to obtain personal patent rights, which he used to establish the eponymous Wang Labs in 1951 that pioneered the earliest desktop calculators, word processors, and scientific and business minicomputers.
各方面都在进步,计算机时代即将到来,但体积庞大、运行温度高、易出故障的三极真空管却阻碍了这一曙光。虽然三极真空管对于收音机来说已经足够好了,因为百万分之一秒的短路不会产生明显的影响,但它们会在电子计算的不间断运行中造成致命错误。
Progress was being made on all fronts and the computer age was looming clear upon the horizon, but the dawn was held back by the bulky, hot-running, failure-prone triode vacuum tubes. Although good enough for radios where a millionth second short-circuit would have no noticeable effect, they would cause crash-worthy errors in the non-stop operations of electronic computing.
锗弹簧点接触晶体管最初是为 AT&T 电信公司研发的,其原型是人们熟悉的“猫须”晶体收音机。1947 年,贝尔实验室的约翰·巴丁和沃尔特·布拉顿发明了锗弹簧点接触晶体管,它可以在体积上取代真空管,充当稳定的放大器和开关,同时散发的热量也少得多,而热量是所有电子设备的祸根。它被成功应用于收音机和计算器,缩小了它们的尺寸,并将其使用寿命提高了几个数量级。
Originally researched for AT&T telecommunications and modeled after the familiar “cat's whisker” crystal radio sets, the germanium spring-loaded point-contact transistor invented in 1947 by John Bardeen and Walter Brattain at Bell Labs could coolly take over the vacuum tube's work both as a stable amplifier and switch at a fraction of the size while emitting far less heat, the bane of all electronic devices. It was successfully utilized in radios and calculators, shrinking their sizes and increasing their useful lives by orders of magnitude.
然而,弹簧式接触器非常脆弱,难以量产,因此,贝尔实验室晶体管团队的负责人威廉·肖克利(William Shockley)脾气暴躁,由于对自己被下属巴丁和布拉顿超越的沮丧,于同年开发出了一种更为坚固的锗平面界面结晶体管。
However, the spring-loaded contact was fragile and difficult to mass produce, so the irascible head of the Bell Labs transistor team, William Shockley, driven by his dismay at being one-upped by underlings Bardeen and Brattain, in the same year developed a more robust germanium flat-interface junction transistor.
但尽管每个晶体管仅使用 8 × 10 −4盎司,但锗实际上只能从煤粉煤灰以及锌、银、铅和铜矿石精炼的副产品中获得,每磅锗的价格比黄金还高。锗结晶体管的价格为 8 美元,而真空管的价格为 0.75 美元,大量电阻器、电感器、开关和电容器的价格仅为几美分,这阻碍了锗的广泛使用。
But although only 8 × 10−4 ounce was used per transistor, germanium was practically available only from coal fly-ash and as a by-product of zinc, silver, lead, and copper ore refining, it cost more per pound than gold. The $8 price of a germanium junction transistor compared to the $0.75 price of a vacuum tube, and pennies for hordes of resistors, inductors, switches, and capacitors, inhibited its wide-spread use.
尽管锗具有更高的电子/空穴迁移率,但其化学近亲硅具有更大的工作带隙,在高温下更稳定,并且像海滩上的沙粒一样丰富。但是纯晶体硅难以生产,半导体功能所需的少数载流子掺杂剂注入也很困难,因为硅的表面因温差材料膨胀而变得粗糙易碎。
Although germanium has a higher electron/hole mobility, its chemical cousin silicon has a larger operational band gap, is more stable at higher temperatures, and is abundant as the grains of sand on the beach. But pure crystalline silicon was difficult to produce and the minority-carrier dopants injection necessary for semiconductor function was difficult because the surface of silicon became rough and brittle from differential temperature materials’ expansion.
这一突破源自难以发音的 Czochralski 晶体生长拉制工艺,该工艺可生产出纯度为 99% 的晶体硅锭,非常适合半导体。当贝尔实验室尝试在氢气氛围中掺杂纯硅时,它意外着火,而且由于硅对氧具有很高的化学亲和力,光滑的 SiO 2(玻璃)覆盖了硅。SiO 2是一种天然绝缘体,可以通过它蚀刻出孔洞通过氟化氢将掺杂剂顺利注入硅中以提供半导体。
The breakthrough came from the impossible-to-pronounce Czochralski crystal-growth puller process that produced 99% pure crystalline silicon ingots just right for semiconductors. When Bell Labs tried doping the pure silicon in a hydrogen gas atmosphere, it accidentally ignited, and because of silicon's high chemical affinity for oxygen, a smooth SiO2 (glass) coated the silicon. SiO2 is a natural insulator through which holes could be etched by hydrogen fluoride to inject the dopants smoothly into the silicon to provide semi-conduction.
1953 年,德州仪器公司的 Gordon Teal 设计了贝尔实验室的工业化生产工艺,并逐渐将硅结晶体管的成本降至 2.50 美元,为其更广泛的应用打开了大门,首先是 TI 广受欢迎的坚固耐用的小型晶体管收音机和袖珍计算器。7
Gordon Teal at Texas Instruments in 1953 engineered the Bell Labs process for industrial production and gradually lowered the cost of silicon junction transistors to $2.50, opening the door for their wider use, starting with TI's popular rugged little transistor radios and pocket calculators.7
然而,晶体管的价格仍然远高于真空管,而对于需要许多真空管的大型电子产品来说,高昂的单位成本抑制了晶体管的使用,但美国军方和国家航空航天局对成本超支的漠不关心恰好挽救了晶体管的使用。
The price of transistors, however, was still far above that of vacuum tubes, and for large-scale electronics requiring many vacuum tubes, the high per unit costs inhibited transistor use, but the cost-overrun insouciance of the US military and the National Aeronautics and Space Administration fortuitously came to the rescue.
航空电子设备、武器、电子战和用于太空探索的计算机都存在固有风险,因此它们要求电子元件坚固、可靠、运行温度低、使用寿命长、维护成本低、体积小、重量轻;而硅结晶体管正是为此而生的。由于政府保证销售,规模经济很快降低了单位成本,推动了晶体管行业的腾飞,从战争进入太空,再进入消费市场,结晶体管的优势很快就在精美的新型消费电子产品的精致小型化中为所有人所了解,最后进入了计算机。
Avionics, weapons, electronic warfare, and computers for space exploration with all their inherent risks called for electronic components that were robust, reliable, cool-running, long-lasting, low maintenance, small and light; just what the silicon junction transistor was all about. With guaranteed government sales, the economies of scale soon lowered the unit costs and propelled the transistor industry take-off, literally from war into space and then to the consumer marketplace, where the benefits of the junction transistor were soon clear to all in the sleek miniaturization of beautiful new consumer electronics products, and finally to computers.
然而,晶体管产品及其不断增强的功能需要更加复杂的电路和更多的晶体管和无源元件,而只有通过数千个晶体管取代真空管才能释放缩写计算机不断增强的功能。
The transistorized products and their ever-increasing features, however, required ever-more complex circuitry with much greater numbers of transistors and passive components, and the ever-increasing power of the acronymic computers could be unleashed only by thousands of transistors taking the place of vacuum tubes.
晶体管是通过在大型硅片上进行光刻蚀刻来批量加工的,然后切割成单个晶体管,再焊接到设备内部电路板上的其他元件上。尽管接线和焊接工作由大量年轻女性的灵巧手指熟练完成,但越来越小的设备中电线的数量之多很快就变得难以处理,尤其是对于功能越来越强大、用途越来越广泛的大型计算机所需的数万个晶体管而言,数十万个接线连接。8
The transistors were batch-processed by photoengrave etching on large wafers of silicon and then cut apart to produce the individual transistors, only to be soldered to other components on circuit boards inside the devices. Although expertly wired and soldered by the nimble fingers of legions of young women, the sheer number of wires in the ever-smaller devices was fast becoming intractable, particularly for the hundreds of thousands of wiring connections for the tens of thousands of transistors required for the ever more powerful and versatile mainframe computers.8
数字计算机的巨大潜力实际上被连接晶体管的复杂线路所扼杀;“数字暴政”抑制了伟大新机器的自由。
The great potential of the digital computer was being literally strangled by the tangle of wiring connecting the transistors; a “tyranny of numbers” suppressing the freedom of the great new machines.
来解放计算机的不是革命领袖,而是意大利烹饪艺术的厨师。为了避免将晶体管肉丸连接起来的意大利面条缠结在一起,人们可以简单地做千层面。9
Coming to liberate the computers was not a revolutionary leader, but chefs of the Italian culinary arts. To avoid the tangled webs of wiring spaghetti connecting the transistors meatballs, one could simply make lasagna instead.9
有源硅半导体晶体管肉丸和连接线意大利面条可以集成到肉层组件和由二氧化硅奶酪层隔开的金属布线面条层中。
The active silicon semiconductor transistor meatballs and the connecting-wire spaghetti could be integrated into meat layer components and a layer of metal wiring pasta separated by layers of silicon dioxide cheese.
有源晶体管、无源电阻器和电容器均由沙漠基底上的单个半导体“台面”制成,所有不同的组件都集成在一个整体中。10
The active transistors and passive resistors and capacitors were all fabricated from single semiconductor “mesas” on a desert substrate, a monolith of all the different components into a single body.10
因此在 1958 年,为了展示自己的才华,杰克·基尔比 (Jack Kilby) 打开了他的单片相移振荡器,将直流电转换为交流信号,当德州仪器的老板们看到直线电压在在线示波器显示屏上急剧变为正弦波时,他很容易就给老板留下了深刻的印象。
Thus in 1958, with a nod to showmanship, Jack Kilby turned on his monolithic phase-shift oscillator that converted a DC into an AC signal and easily impressed his Texas Instruments bosses as they watched the straight-line voltage dramatically change to a sine wave on the in-line oscilloscope display.
虽然资金很快就到了,但是,为了急于提交专利申请,基尔比使用了“飞线”图,将所有复杂的线路连接都暴露在外,而且他的原型也没有正确地声称任何集成的线路连接。
Funding was immediate, but in the rush to file a patent application, Kilby used a “flying wire” drawing with all the gangly wiring connections exposed, and his prototype did not properly claim any integrated wiring connections.
同年,这些飞线被蒸发并嵌入金属层,该金属层渗透到 SiO 2绝缘层中预先蚀刻的微小孔中,形成平面半导体层的印刷电路导电通道。Fairchild Semiconductor 的 Robert Noyce 因此可以演示一种适用于计算机的双态触发器集成电路,它不仅省去了数千根电线的繁琐焊接工作,而且由于元件之间的距离只有微米,而电信号的传播速度只有光速的一半,触发器开关预示着超高速集成电路时代的到来。11
In the same year, those flying wires were evaporated and embedded as a metal layer that seeped through tiny holes pre-etched in the SiO2 insulating layer to form printed circuit conduction channels for the planar semiconductor layers. Robert Noyce at Fairchild Semiconductor thus could demonstrate a two-state flip-flop integrated circuit ideal for computers that not only obviated the tedious soldering of thousands of wires, but with the micron distances between components and electrical signals traveling at about half the speed of light, the flip-flop switch heralded the age of the superfast integrated circuit.11
很快,随着 TI 的动态和静态随机存取存储器(DRAM 和 SRAM) 以及仙童公司和后来的英特尔的中央处理器(CPU) 的出现,计算机内存容量增加了几个数量级,它们分别成为即将出现的新型个人计算机的孤岛和主力。
Quickly, computer memory capacity increased by orders of magnitude with TI's dynamic and static random access memory (DRAM and SRAM) and Fairchild and later Intel's central processing units (CPUs) that respectively would become the silos and workhorses of the new personal computers that were looming on the horizon.
数字计算机终于摆脱了布线的束缚,数以百万计的晶体管和元件现在可以集成并作为完整的电子实体插入主板。开关和放大器的尺寸逐渐从相对笨重、发热的玻璃壁三极管缩小到凉爽的点接触晶体管,再到结型晶体管,最后到完全集成的晶体管电路。
The digital computer was at last free from its wiring bondage, millions of transistors and components now could be integrated and inserted on motherboards as complete electronic entities. The sizes of the switches and amplifiers progressively shrank from the relatively massive, hot glass-walled triode vacuum tubes to the cool point-contact transistor, to the junction transistor, and finally to the completely integrated transistor circuits.
数字的暴政被单片集成所克服,晶体管新技术催生出的大批熟练的年轻女焊工被自动化半导体平面工艺制造的新技术所解放,这便是机器取代人类的经典故事。
The tyranny of numbers was overcome by a monolithic integration, and the legions of skilled young women solderers spawned by the new technology of transistors were set free by the newer technology of automated semiconductor planar-process fabrication in an oft-repeated tale of machines replacing humans.
1963 年发明的低噪声、低温运行、零伏静态功耗(仅在0到1之间交替时才需要电压)、薄栅极互补金属氧化物半导体(CMOS) 很快就占据了主导地位,它“补充”了n型 (NMOS) 和p 型(PMOS) 晶体管,可以在晶圆上按照密集的设计规则形成,从而降低了尺寸和价格,不受摩尔定律的影响。12
Invented in 1963, the soon to be dominant low-noise, cool-running, zero-volt static power (needs voltage only when alternating between 0 to 1), thin-gate Complementary Metal-Oxide Semiconductor (CMOS) that “complemented” n-type (NMOS) and p-type (PMOS) transistors could be formed under dense design rules on a wafer bringing down sizes and prices in abeyance to Moore's Law.12
CMOS 很快就成为个人电脑中几乎所有集成电路的主力半导体,并且薄膜版本用于液晶显示器,使笔记本电脑、平板电视和壁挂式电视成为可能。CMOS 传感器还取代了光学传感器和数码相机中更昂贵的电荷耦合器件 (CCD),事实上,LCD 使手机成为可能,它不仅使持续的电话联系成为可能,而且使即时信息、音乐和叫车成为可能,同时为人工智能的训练集积累大数据。
The CMOS would soon become the workhorse semiconductor for almost all the integrated circuits employed in personal computers, and in the thin-film version used in liquid crystal displays that made possible the notebook computer, flat-panel monitors, wall-hanging television sets. CMOS sensors also replaced the more expensive Charge-Coupled Devices (CCDs) in optical sensors and digital cameras, and indeed the LCD made possible the mobile phone, which not only made possible continuous telephone contact, but also instant information, music, and ride-hailing, all the while amassing Big Data for the training sets of artificial intelligence.
20 世纪 60 年代末,在彻底击败雷明顿·兰德公司,并击退了当地竞争对手控制数据公司 (CDC)、数字设备公司 (DEC) 以及阿姆达尔,当时还有日本的 NEC 和富士通之后,IBM 的 System/360 集成电路计算机将其大型机霸权扩展到了世界的各个角落。
In the late 1960s, after soundly defeating Remington Rand and fending off local rivals Control Data Corporation (CDC), Digital Equipment Corporation (DEC), and Amdahl, then Japan's NEC and Fujitsu, IBM's System/360 integrated circuit computers extended its mainframe hegemony to all corners of the world.
然而,在北加州的硅谷,一场巨大的变化很快不仅导致计算机尺寸缩小,而且 IBM 本身也缩减了尺寸。
Off the silicon shores of Northern California, however, a sea change soon would not only downsize computers, but also IBM itself.
世界上第一台微处理器是 2250 个晶体管的 4 位英特尔 4004,于 1971 年为日本计算器制造商 Busicom 制造。1972 年的后续版本 8008 微处理器将 8 位字节排列成 256 个唯一的 1 和 0 阵列,可以处理十个数字、所有字母、标点符号和其他符号,在转换为 NMOS 之后,速度更快的 8080 微处理器与其竞争对手 Zilog 的 Z80 一起成为 Altair 等自制计算机套件的核心,供十几岁的男孩摆弄。
The world's first microprocessor, the 2250 transistors 4-bit Intel 4004 was made for the Japanese calculator maker Busicom in 1971. A later version in 1972, the 8008 microprocessor arranged 8-bit bytes into 256 unique arrays of ones and zeros that could handle the ten numerical digits, all the letters of the alphabet, punctuation marks, and other symbols, and after conversion to NMOS, the faster 8080 microprocessor, together with its competitor Zilog's Z80, was the heart of home-build computer kits such as Altair for teenage boys to tinker with.
在这些年纪较大的男孩中,史蒂夫·沃兹尼亚克 (Steve Wozniak) 于 1976 年制造出了第一台通用、紧凑、独立的家用计算机,而史蒂夫·乔布斯 (Steve Jobs) 大力推广其衍生产品 Apple II,以至于迫使不情愿的 IBM 带着其 PC 加入了个人计算机大军。PC 是一个开放系统,实际上是出于对反垄断调查的担忧而强制要求的,因此采用了英特尔的 CPU、TI 的内存和初创公司微软的操作系统。
Among those older boys, Steve Wozniak built the first general-purpose, compact, stand-alone home computer in 1976, and Steve Jobs promoted the derivative Apple II so aggressively as to force a reluctant IBM to join the personal computer parade with its PC, an open system effectively mandated by fear of Antitrust investigation, and so run by CPUs from Intel, memory from TI, and an operating system from start-up Microsoft.
开放系统允许康柏公司进行克隆,也允许新制造商降低价格,例如台湾的宏碁,它们以自己的品牌在世界各地销售个人电脑,但大多是作为几乎所有其他品牌的 OEM,包括 IBM、戴尔、惠普、东芝和索尼。13
The open system allowed cloning by Compaq and price reduction by new manufacturers such as, Acer in Taiwan, who sold PCs all over the world under their own brands but mostly as OEMs for almost all the other brands, including IBM, Dell, HP, Toshiba, and Sony.13
虽然超薄笔记本电脑的功能已经远远超越了早期体积庞大的大型机,但物理和化学量子力学、天体物理学、流体动力学、天气预报、物理和工程模拟以及人工智能大数据的多参数、相互依赖、非线性微分方程的大规模并行计算需要更快的计算机速度和功率。
While the capabilities of even super-thin notebook computers have already far surpassed the bulky early acronymic mainframes, much greater computer speed and power are required for the massively-parallel calculations of the multi-parameter, interdependent, non-linear differential equations of physical and chemical quantum mechanics, astrophysics, fluid dynamics, weather prediction, physical and engineering simulations, and indeed the Big Data of artificial intelligence.
最近,超级计算机已用于分子和遗传建模、密码分析、宇宙起源的宇宙学计算,现在还用于病毒和气候变化模型的生物系统模拟;它们可以而且很可能很快就会用于总体人工智能。
More recently, supercomputers have been used in, molecular and genetic modeling, cryptanalysis, cosmological calculations of the beginning of the Universe, and now crucially in biological system simulations of viruses and climate change models; they can and likely soon will be used in overarching artificial intelligence.
例如,天体物理学家需要 3000 年才能解出恒星从诞生到死亡的演化方程,而超级计算机只需几秒钟就能完成。更近一点,核武器爆炸模拟可以让拥有核武器的交战国只需交换模拟打印输出和计算机图形,就能确定谁的炸弹更具破坏性,将战争从战场带到实验室,只要世界领导人懂点科学就行了。14
For example, solving the equations of stellar evolution for a star from birth to death would take an astrophysicist 3,000 years, while a supercomputer can do it in seconds. Closer to home, simulations of nuclear weapons explosions could allow nuclear-armed belligerents to just exchange simulation print-outs and computer graphics to determine whose bomb was more destructive, bringing warfare from the battlefield to the laboratory, if only world leaders knew something about science.14
20 世纪 70 年代,控制数据公司和 Cray Research 开始争夺世界最快计算机的头衔。最初,CDC 和 Cray 几乎是孤军奋战,先是日本 NEC 和富士通,然后是中国的神威和天河,后两家最近又与美国的 IBM Summit 和 Sierra 轮流争夺全球超级计算机霸主地位。
In the 1970s, Control Data Corporation and Cray Research began the race for the title of world's fastest computers. Initially virtually alone in the field, CDC and Cray met competition first from Japan's NEC and Fujitsu, then later from China's Sunway and Tianhe, the latter two recently taking turns with America's IBM Summit and Sierra in the race for global supercomputer supremacy.
2018 年,巅峰超级计算机以每秒 122.3 千万亿次浮点运算的 HPL 基准从神威超级计算机手中夺回了王冠,但中国的超级计算机在 2018 年超级计算机排名中占据了前 500 名中的 227 个,其中还包括来自日本、法国和德国的超级计算机。在执行有用的科学和工程模拟的同时,超级计算机也成为国家技术实力的象征。
Summit regained the crown from Sunway in 2018 with an HPL benchmark 122.3petaflops, but China's supercomputers claimed 227 of the top 500 in the 2018 supercomputer rankings, that also included supercomputers from Japan, France, and Germany. While performing useful scientific and engineering simulations, the supercomputers have also become symbols of national technological prowess.
运行超级计算机的大规模并行处理器核心主要来自 IBM、英特尔、神威、富士通、ARM 和 Nvidia,设计架构来自各个超级计算机实体,其中包括许多大学、研究机构和半公共机构。15
The massively-parallel processor cores running the supercomputers were primarily from IBM, Intel, Sunway, Fujitsu, ARM, and Nvidia, and the design architectures were from the individual supercomputer entities which included many universities, research institutions, and semi-public concerns.15
这些惊人的速度和能力,加上大数据和深度人工神经网络,不禁让人联想到令人不安的画面:一台超级计算机控制着一支庞大的通用机器人大军,在人类文明中肆虐,或者至少在不久的将来,实现所有制造和服务的自动化,并监控人们在家中和工作场所的状态,同时控制美国、中国、欧洲乃至全世界道路上和轨道上的几乎所有车辆。
These fantastical speeds and capabilities, combined with Big Data and deep artificial neural networks cannot help but evoke disturbing images of a supercomputer controlling a vast army of general-purpose robots running rampant over human civilization, or at least in the very near future, automating all manufacturing, services, and monitoring humans at home and at work, together with controlling almost all the vehicles on the roads and tracks of first America, China, Europe and then the whole world.
电视雅卡尔织布机的打孔卡指示机器如何织布,洛夫莱斯伯爵夫人的打孔卡指示巴贝奇的分析机如何计算,布尔的二进制代数指示计算机如何计算。机器可以理解这些指令集,然后可以将其视为用机器语言编写的。
The punch-hole cards of Jacquard's loom instructed the a machine to weave cloth, the Countess of Lovelace's punch-hole cards instructed Babbage's Analytical Engine how to calculate, and Boole's binary algebra instructed a computer how to compute. The machines could comprehend these instruction sets which then could be seen as written in a machine language.
在二进制计算机中,由零和一组成的编码字符串构成指令,例如,在 8 位指令中,前四位可能告诉计算机要做什么,而后四位告诉它可以在哪里找到要使用的数据。
In a binary computer, encoded strings of zeros and ones constituted the instruction where, for example, in an 8-bit instruction, the first four bits may tell the computer what to do and the last four bits tell it where the data to use can be found.
计算机基本操作超过 200 种,因此用机器语言编码和跟踪所有指令集是一场噩梦,更糟糕的是,不同的机器有不同的设计和结构,即使它们相似,也没有程序步骤顺序和寄存器地址的规则,因此,即使是用机器语言为某台机器编写的程序,也无法转移到另一台机器上。
There are more than 200 fundamental computer operations so encoding and keeping track of all the instruction sets in machine language was a nightmare, made worse by the fact that different machines would have different designs and structures, and even if they were similar, there were no rules for program step sequences and register addresses, so a program written for one machine, even in machine language, was not transferable to another machine.
计算逻辑问题已经被图灵和冯·诺依曼解决了,但用机器语言编程的实际实现却非常困难,而且容易出错,编写的程序无法在不同的机器上使用,甚至程序员自己事后也几乎无法理解。
The computational logic problem had been solved by Turing and von Neumann, but the practical implementation of programming in machine language was difficult and error-prone, the programs could not be used on different machines, and were almost impossible to understand even by the programmers themselves after the fact.
这为软件通用化提供了动力,UNIVAC 项目的助理 Grace Hopper 被 John Mauchly 指派使他们的新机器二进制自动计算机(BINAC)能够接受用户编写的代数方程。
This provided the impetus for software generalization, and Grace Hopper, an assistant on the UNIVAC project, was assigned by John Mauchly to make their new machine, the Binary Automatic Computer (BINAC), capable of accepting algebraic equations as written by the user.
霍珀立刻意识到“人们可以使用机器代码以外的某种更高级的代码”来对计算机进行编程,这种代码更容易被人类理解,计算机可以将其翻译成机器语言,以供计算机翻译和阅读。
Hopper realized at once that “one could use some kind of higher code other than the machine code” to program a computer, a code that was more easily understood by humans that could be translated by the computer into machine language for the computer to translate and read.
由于在操作过程中许多过程和数学计算都会重复进行,Hopper 和同事们设计了简短的命令助记符,例如 LOAD、STORE 和 PRINT,并将它们与相关的机器语言指令集及其组合相关联,以便可以使用简单的命令来调用所需过程的执行。
Since many processes and mathematical computations are repeated during an operation, Hopper and co-workers devised short command mnemonics such as LOAD, STORE, and PRINT, associating them with the relevant machine language instruction sets and combinations thereof, so that simple commands could be used to call up the performance of a desired process.
这些子程序还包括数学计算,例如求平方根 (SQRT),一旦用机器语言汇编,就可以存储在子程序库中,当与其他操作一起编译时,程序员可以使用简短的助记符命令调用这些命令来执行计算。使用这种汇编语言汇编的此类命令序列比机器语言更容易编写,也更容易让其他人理解。1
These subroutines also included mathematical computations, for example taking a square root (SQRT) that, once assembled in machine language, could be stored in a subroutine library, and when compiled with other operations, could be called up by the programmer using the short mnemonic commands to perform the calculation. A sequence of such commands assembled using this assembly language was easier than machine language for someone to write and others to understand.1
这些子程序一旦在计算机中编译完成,任何人都可以使用它们来编写汇编语言程序,而机器只需访问所调用的机器语言指令集,即可将汇编语言翻译成机器语言。1952 年,霍珀将语言翻译称为编译,将翻译器称为编译器,这些术语至今仍在使用。
These subroutines once compiled in the computer could then be used by anybody to write programs in assembly language, with the machine doing the work of translating from assembly language to machine language by simply accessing the called-upon machine language instruction sets. Hopper in 1952 called the language translation compiling, and the translator was called a compiler, terms that are still in use today.
在她为海军研究哈佛 Mark I 的后继者期间,Mark II 神秘地崩溃了。经过调查,霍珀发现,一只飞蛾误入了迷宫般的电路,并因阻塞了导致故障的电继电器开关而死亡。因此,“调试计算机”一词成为了任何计算机故障的代名词。
During her work for the Navy on the successor to the Harvard Mark I, the Mark II mysteriously crashed, and upon investigation, Hopper found that a moth had wandered into the maze of circuitry and died in shock blocking an electric relay switch that caused the malfunction. And so the term “debugging the computer” became the term for any computer malfunction.
尽管汇编语言比机器语言更容易掌握和使用,但它仍然需要指定地址寄存器,并了解所有可能因机器而异的简短指令助记符;它还需要对每个机器的设计和工作原理。因此,下一步是制定一种更易于编写和理解的高级语言,然后让计算机使用编译器将该语言翻译成汇编语言,然后再翻译成机器语言。
Although much easier to master and use than machine language, assembly language still required designating address registers and knowing all the short instruction mnemonics which may vary from machine to machine; it also still required an intimate knowledge of each machine's design and workings. The next step thus was to formulate a higher-level language that was easier to write and understand, and again have the computer translate that language into assembly language and then into machine language, using compilers.
1957 年,IBM 软件开发先驱 John Backus 创建了 FORTRAN(公式翻译),该语言可在新型 IBM 704 上编译使用,从高级语言到汇编语言,最后到机器语言。FORTRAN 使用相对自然的语言语句,将多个汇编语言和机器语言指令卷积为单个语句,这些语句可以命令计算机执行主要的科学和工程计算。
Software development pioneer John Backus at IBM in 1957 created FORTRAN (Formula Translation) that could be compiled for use on the new IBM 704, going from high-level to assembly, and finally to machine language. FORTRAN employed relatively natural language statements that convolved several assembly and machine language instructions into single statements that could command a computer to perform mainly scientific and engineering calculations.
当计算机启动时,内置程序会指示 CPU 找到操作系统,然后操作系统将控制系统硬件;一些功能程序会永久存储在 ROM 中,CPU 会使用编译器将高级语言计算机程序翻译成汇编语言,然后执行机器语言指令;RAM 通常用于在处理过程中保存信息和程序。
When a computer is turned on, built-in programs instruct the CPU to find the operating system which then takes control of the system hardware, a few functional programs are permanently stored in the ROM and the CPU executes the instructions in machine language after being translated from a high-level language computer program through an assembly language using a compiler; a RAM typically holds information and programs during processing.
通用商业导向语言(COBOL)主要用于商业用途,它能够更高效地对大量数据进行归档、排序、合并、添加、减去和计算百分比,并方便地生成报告,所有这些都使用句法英语。COBOL 仍然被企业使用,并且通常是存储在许多公司的大型机档案中以供参考的遗留语言。
Primarily for business and commercial use, the Common Business Oriented Language (COBOL) could more efficiently file, sort, merge, add, subtract, and calculate percentages over large sets of data, and conveniently generate reports, all using syntactical English. COBOL is still used by businesses and often is a legacy language stored in many companies’ mainframe archives for reference.
作为 COBOL 的句法性质和普遍性的一个例子,当 Grace Hopper 被困在日本的一个计算机中心时,她很难向不会说英语的日本程序员表达她想回到酒店的愿望,于是她求助于 COBOL,指着自己说“MOVE”,然后指着中心外面说“GOTO Osaka Hotel” 。2
As an example of the syntactical nature and universality of COBOL, when Grace Hopper was stranded at a computer center in Japan, and having difficulty in communicating her wish to go back to her hotel to the non-English speaking Japanese programmers, she resorted to COBOL, pointing to herself and saying “MOVE”, and then pointing outside the center, said “GOTO Osaka Hotel”.2
算法语言(ALGOL)主要在欧洲设计,旨在更符合计算机科学的基本语法原则,表面上是为了比 FORTRAN 更优雅地进行数学计算,但实际上它是欧洲对抗美国 FORTRAN 霸权的假定手段。毫不奇怪,ALGOL一度成为欧洲开发软件的通用语言,但由于缺乏由于内置了诸如标准化输入/输出机制之类的实用程序,FORTRAN 在大多数新软件编写的美国仍然占据主导地位。
Primarily in Europe, the Algorithmic Language (ALGOL) was designed to be more in tune with fundamental computer science syntactical principles ostensibly to do mathematical computation more elegantly than FORTRAN, but in truth was Europe's putative counter to America's FORTRAN hegemony. Not surprisingly, ALGOL became the lingua franca for European-developed software for a time, but for lack of built-in utilities such as a standardized input/output regime, FORTRAN still held sway in America where most new software was written.
所有这些语言都是基于命令式编程方法,即程序在任何时候都有一个隐式状态,定义所有变量的值和当前控制点,并且当程序通过冯·诺依曼架构执行时,程序员可以通过检查程序的转储文件了解计算机经过的每个步骤和状态。
All these languages were based on the imperative programming approach that at any time the program has an implicit state defining the values of all the variables and current point of control, and as the program executes by means of the von Neumann architecture, the programmer can know each and every step and state through which the computer passes by examining the program's dump files.
命令式编程的主要数学逻辑和操作功能在面向对象编程方法中扩展为对象,该方法通过将数据和指令分组为模块化对象的直接和显式表示来模拟系统,这些模块化对象具有一组指令和相关的离散数据部分,每个数据部分代表给定系统运行的一个方面。
The primarily mathematical logic and operational functions of imperative programming were expanded to objects in an object-oriented programming approach that simulated systems by grouping data and instructions into direct and explicit representations of modular objects having a set of instructions and relevant discrete portions of data each representing one facet of a given system run.
例如,银行程序会定义诸如储蓄、支票和定期存单账户之类的对象,以及诸如余额、利息、银行费用等概念;而电子工程程序会定义诸如电阻器、电容器和晶体管之类的对象,以及诸如电阻、电容和栅极电压之类的概念,然后按照操作指令对对象进行分类和比较。面向对象语言后来被用C++、Java、Python 和 JavaScript编写的人工智能程序所采用。
For examples, a banking program would define objects such as savings, checking, and certificate of deposit accounts, and concepts such as balance, interest, bank charges, and so on, while an electrical engineering program would define objects such as resistors, capacitors, and transistors, and concepts such as resistance, capacitance, and gate voltage, and then following operation instructions, classifying and comparing the objects. Object-oriented languages would later be employed by artificial intelligence programs written in C++, Java, Python, and JavaScript.
高级语言计算机编程从以 FORTRAN 为主的批处理模式计算机打卡机,发展到远程哑终端分时,再到纸带指令的 PDP 系列等小型计算机,再到英特尔微处理器运行的 BASIC Altair 微处理器,再到可以通过互联网访问任何语言的个人计算机,最后发展到在线主机编码平台(如GitHub),其中更直接控制的C++语言和更高级的 Python 和 Java 在新程序的制作中占主导地位,源代码已发布并免费分发。这种开源软件(OSS)是人工智能发展的重要资源。
High-level language computer programming progressed from the mainly FORTRAN batch-mode computer punch-card to remote dumb terminal time-sharing, and then to minicomputers like the paper-tape instructed PDP series, to the Intel microprocessor-run BASIC Altair microprocessor, to personal computers that can access any language over the Internet, and finally to online host-computer coding platforms such as GitHub where the more direct control C++ language and the higher-level Python and Java are dominant in the production of new programs with source code published and freely distributed. This open source software (OSS) was a critical resource for the development of artificial intelligence.
电视第一个远距离电子通信是塞缪尔·莫尔斯的三态(关、点、划)三进制系统,代表字母表中的字母,这些字母的组合可以形成单词和消息。信号通过专用传输线发送,但尽管成功了,但它只能按顺序一次发送一条消息,因此在作为远程通信系统的使用方面非常有限。为了携带更多的消息,埃米尔·鲍多想到改变不同消息状态之间的时间,允许消息通过其频率进行区分并在同一线路上同时发送。
The first electronic communication over distance was the three-state (off, dot, dash) trinary system of Samuel Morse representing letters of the alphabet which combinations thereof could form words and a message. The signals were sent over dedicated transmission lines, but although successful, it could only send messages sequentially one at a time and was so limited in its use as a long-range communications system. To carry more messages, Emile Baudot thought of changing the time between states for different messages, allowing the messages to be differentiated by their frequency and sent simultaneously over the same line.
这种多路复用的概念被用于专用线路上的机器通信,在两状态(开、关)二进制编码电报中,使用电传打字机和五位波特码来立即传输消息,或将消息记录在纸带上以便延迟传输或存储。
This concept of multiplexing, was employed for machine communications over dedicated lines, in the two-state (on, off) binary-coded telegraph using a teletypewriter and a five-bit Baudot code to instantly transmit messages, or record messages on paper tape for delayed transmission or storage.
1876 年,亚历山大·格雷厄姆·贝尔 (Alexander Graham Bell) 制造了一个与松散碳粒袋接触的振动膜,并在其中通入直流电流,因此当声音振动振动膜时,它将影响碳粒,从而改变碳粒在袋中的密度,从而调节电流,从而产生代表声音的电流。因此,“语音”电流可以通过一根缠绕在与振动膜接触的电磁铁上的电线传输到目标接收器,振膜将根据语音电流振动,从而在目的地重新形成语音。
Alexander Graham Bell in 1876, constructed a diaphragm in contact with a bag of loose carbon particles and ran a dc current through it, so when a sound vibrates the diaphragm, it will impact the carbon particles changing their density in the bag thereby modulating the current, thus producing an electric current that represents the sound. Thus a “voice” current can be transmitted to a destination receiver in a wire which is wrapped around an electromagnet in contact with a diaphragm, the diaphragm will then vibrate in accord with the voice current thus reforming the speech at the destination.
多路复用允许通过一条线路同时传输电话呼叫。如今,通过不同频率进行多路复用称为频分多址(FDMA),通过编码消息进行划分称为码分多址(CDMA)(GPS 全球定位系统使用这种技术),使用分配的时隙进行划分称为时分多址(TDMA),使用传输线路两端的同步开关划分消息称为同步分多址(SDMA)。除此之外,还有许多不同多路复用方案的变体和组合,例如欧洲使用的串联 TDMA/FDMA全球移动通信系统(GSM) 和中国使用的双工同步时分/码分系统,称为 TD-SCDMA。
Multiplexing allowed simultaneous transmission of telephone calls over a single line. Today the multiplexing by different frequencies is called frequency-division multiple access (FDMA), division by coding messages is called code-division multiple access (CDMA) (which is used by the GPS Global Positioning System, using allocated time-slots is time-division multiple access (TDMA), and using synchronized switches at each end of the transmission line to divide messages is called synchronous-division multiple access (SDMA). In addition to all that, there are many variations and combinations of the different multiplexing schemes as used for example in Europe the tandem TDMA/FDMA Global System for Mobile (GSM) and in China a duplex synchronous time/code-division system called TD-SCDMA.
对于互联网通信,早期有线以太网使用空分多址(SDMA) ,而 IEEE802 系列标准协议用于局域网 (LAN)、WiFi 和互联网访问的无线路由。电磁波具有极化能力,例如用于无绳电话、数字无线电、ADSL 和 LTE 的极化多址(PDMA),从而提高了速度。对于通过光纤电缆进行的互联网传输,使用波分多址(WDMA) 。1
For Internet communication, space-division multiple access (SDMA) is used on the early wired Ethernet and the IEEE802 family of standard protocols was used for wireless routing for local area networks (LANs), WiFi, and accessing the Internet. Improved speed was provided by the capability of electromagnetic waves to be polarized as in polarized-division multiple access (PDMA) which was used for cordless telephones, digital radio, ADSL and LTE. For Internet transmission over fibre optic cables, wave-division multiple access (WDMA) is used.1
对于 4G 和 5G 移动设备通信,包括移动电话、自动驾驶汽车、机器人和许多其他通信系统,使用正交频分多址(OFDMA),然而多址会导致载波间干扰(ICI,串扰),因为子载波的频率偏差会影响极化正交性,因此必须单独对每个子载波进行整形,以防止子载波频率重叠。毋庸置疑,各种复用对于处理电子通信的爆炸式增长至关重要。2
For 4G and 5G mobile device communications, including mobile phones, autonomous cars, robots, and many other communication systems, orthogonal frequency-division multiple access (OFDMA) is used, however multiple access causes inter-carrier interference (ICI, cross-talk) because frequency deviations of the sub-carriers will affect the polarization orthogonality, so each sub-carrier must be individually shaped to prevent sub-carrier frequency overlapping. Needless to say, all kinds of multiplexing were essential for handling the explosive growth of electronic communications.2
语音电子通信通过将信息分成可以在最短时间内同时发送的数据包来有效地传输。繁忙的线路,然后在目的地重新组装以重新形成消息,这一过程称为分组交换。
Voice electronic communications are efficiently transmitted by dividing them into packets that can be simultaneously sent over the least busy lines and then reassembled at the destination to re-form the message in a process called packet-switching.
早期的个人计算机通信是通过电话线进行的,通过调制发送器上的载波到接收器调制解调器,传输速率称为波特率(以纪念 Baudot)。几年后,任何人都会记得处理不稳定的调制解调器通信时的惊慌失措。
Early personal computer communication was carried by telephone lines through modulation of a carrier wave at the transmitter to the receiver modem at a rate of transmission called the baud rate (in honor of Baudot). Anyone of a few years will remember the consternation of dealing with unstable modem communications.
直接有线的计算机间通信比电话线传输更快、更可靠,但一开始直接有线的计算机只能通过其主机网络服务器进行通信,并且由于每个主机网络服务器的通信操作系统可能不兼容,因此并非所有计算机都可以相互通信。
Directly-wired inter-computer communications were much faster and more reliable than the telephone line transmissions, but in the beginning directly-wired computers could only communicate through their host network server, and since each host network server communication operating system may not be compatible, not all computers could communicate with each other.
个人计算机之间的通信显然需要监督、监管和制定标准。兼容通信的规则由互联网工程任务组(IETF) 制定,这是一个松散的非营利组织,任何人都可以通过该组织贡献专业知识来形成协议,例如互联网协议(IP)、超文本传输协议(HTTP) 和简单邮件传输协议(SMTP),这些协议规定了计算机间通信的形式、传输速率和错误检查方法。此后,每个主机服务器通信操作系统都必须遵循协议才能相互通信,从而事实上要求计算机间通信的传输标准化。3
Oversight, regulation, and standards-setting were clearly necessary for personal computer inter-communications. Rules for compatible communicating were set by the Internet Engineering Task Force (IETF), a loose non-profit organization through which anyone can contribute expertise to form protocols such as the Internet Protocol (IP), Hypertext Transfer Protocol (HTTP), and Simple Mail Transfer Protocol (SMTP), which dictated the form, transmission rate, and error-checking methods for inter-computer communication. Each host server communication operating systems thereafter was obligated to follow the protocols in order to communicate with each other, thereby de facto mandating transmission standardization for inter-computer communication.3
早期的形式协议包括美国信息交换标准代码(ASCII),它是 128 个字符串的编码系统,每个字符串由七个 1 和 0 组成,每个序列代表一个阿拉伯数字、一个英文字母、标点符号、符号和键盘 ENTER 等操作,于 1963 年成为所有计算机的通用标准。
Among the earlier protocols for form was the American Standard Code for Information Interchange (ASCII), the coding system of 128 strings of seven ones and zeros, each sequence representing an Arabic numeral, a letter of the English alphabet, punctuation marks, symbols, and actions such as keyboard ENTER, became the universal standard for all computers in 1963.
在计算机时代的初期,计算机都是集中式大型机批处理和现场分时终端需要用户长途跋涉到计算机中心才能使用计算机。“计算机时间”是一种收费的提议,虽然它可能提高了程序效率,但也可能因为“计算机费用”而阻止一些人根本不使用主计算机。
In the beginning of the computer age, the computers were all centralized mainframe batch-processing and on-site time-share dumb terminals that required the user to trek to the computer center to use the computer. “Computer time” was a fare-based proposition charging for use, and although it may have improved program efficiency, it may also have dissuaded some to not use the main computer at all because of the “computer charges”.
早在1961年,麻省理工学院人工智能实验室的联合创始人约翰·麦卡锡就建议在IBM 709上进行远程分时,该建议随后促成了美国国防部高级研究计划局于1969年建立的阿帕网,连接了犹他州和加利福尼亚州的国防工作研究实验室,这是第一个功能齐全的分组交换网络,它的后代对于人工智能开发人员之间的交流至关重要。
Earlier, in 1961, John McCarthy, a co-founder of the MIT Artificial Intelligence Laboratory, suggested remote time-sharing on the IBM 709, and the proposal subsequently led to the Department of Defense Advanced Research Project Agency's ARPANET in 1969 linking defense-work research laboratories in Utah and California, this was the first fully-functional packet-switched network whose progeny would be critical for the communication among the developers of artificial intelligence.
随着分布式计算能力的出现,分布式计算机的概念也随之而来。数字设备公司 (DEC) 的一系列新推出的小型计算机(他们称之为可编程数据处理器(PDP-n))以及惠普、霍尼韦尔和通用电气等其他公司满足了这一需求。这些小型计算机没有大型机的处理能力,但它们更小、更便宜,并且可以针对各个实体的项目的特定任务进行编程,也可以链接到大型机。
With the advent of distributed computing power came the idea of distributed computers themselves. The demand was met primarily by the Digital Equipment Corporation's (DEC) series of newly-coined minicomputers in a series they called Programmed Data Processors (PDP-n), and by other companies such as Hewlett-Packard, Honeywell, and General Electric. The minicomputers did not have a mainframe's processing power, but they were smaller and cheaper, and could be programmed for specific tasks for the projects of the individual entities, as well as linking to a mainframe.
但当时每家公司的小型计算机都有自己独特的设计和操作系统,因此必须用汇编语言为每台小型计算机从头编写程序,如果公司或部门升级或更换到不同供应商的小型计算机,所有旧程序和例程都将毫无用处,必须重新编写才能在新的小型计算机上使用。
At the time, however, each company's minicomputer came with its own specific design and operating systems, so programs for it had to be written from scratch in assembly language for each minicomputer, and if a company or department upgraded or changed to a different vendor's minicomputer, all the old programs and routines were useless and had to be re-written for use on the new minicomputer.
在贝尔实验室,研究员肯·汤普森的太阳系模拟程序被加载到一台图形效果良好的 PDP-7 上,但令他苦恼的是,这台小型计算机缺少复印和打印等基本实用程序,因此他和同事丹尼斯·里奇用汇编语言编写了实用程序,并添加了文件管理和文本编辑功能。然后,他们将这些例程加载到贝尔实验室新购买的一台 PDP-11 上,这台 PDP-11 比 PDP-7 速度更快,内存更大。
At Bell Labs, researcher Ken Thompson's Solar System Simulation program was loaded on a PDP-7 with good graphics, but that minicomputer to his chagrin lacked basic utility routines such as copy and print, so he and colleague Dennis Ritchie wrote the utilities in assembly language and added file management and text-editing. They then loaded these routines into one of Bell Labs’ newly-purchased PDP-11s that was faster with more memory than the PDP-7.
尽管汤普森的太空旅行图形给人留下了深刻的印象,但贝尔实验室的其他研究人员对实用程序更感兴趣,因此,为了响应许多请求,汤普森和里奇为 PDP-11编写了一个操作系统手册,称该系统为 UNIX,该系统于 1972 年升级为分时管理,并进一步扩展可容纳 100 个模块,用于记录仪器读数、排序、链接和分析数据,这些数据可以按顺序运行用户的特定研究程序。所有程序和实用程序都可以存储在单个 PDP-11 中,因此可以作为独立计算机满足研究人员的特定需求。4
Although impressed by Thompson's Space Travel graphics, other researchers at Bell Labs were more interested in the utility routines, and so in response to many requests, Thompson and Ritchie wrote up an operating system manual for the PDP-11s, calling the system UNIX which was upgraded for time-share management in 1972, and further expanded to hold 100 modules for recording instrument readings, sorting, linking, and analyzing data that could be sequenced to run a user's specific research program. All the programs and utilities could be stored in a single PDP-11 and thus were available for a researcher's specific needs as a stand-alone computer.4
然而,贝尔实验室的大多数科学家和工程师都不知道如何用汇编语言编写计算机程序,更不用说机器语言了,因此汤普森和里奇编写了一种用于编写 UNIX 操作系统的高级语言,他们称之为“ C ”(作为当时使用的有限的Basic 组合编程语言“ B ”的后继者)。 C诞生于 1973 年,是C++语言的前身,现在广泛用于编写人工智能算法。5
Most of the scientists and engineers at Bell Labs, however, did not know how to write computer programs in assembly language, let alone machine language, so Thompson and Ritchie wrote a higher-level language for programming the UNIX operating system that they called “C” (as the successor to the limited Basic Combined Programming Language called “B” then in use). Born in 1973, C is the progenitor of the C++ language now widely used for coding artificial intelligence algorithms.5
1974 年,DEC以象征性的费用向使用 PDP 计算机的行业和大学授予了 UNIX 许可证,C 语言在价值 50,000 美元的 PDP-11 上发布,该机器装载了 UNIX,很快成为科学和工程领域事实上的标准分布式小型计算机。PDP-11 小型计算机不仅可以用作独立的多用途计算机,还可以使用中央内核管理数据和程序文件、控制实验室设备、文本编辑和数据格式化报告,并通过shell在多个本地和远程网络上进行分时和程序共享。
With a nominal-fee license for UNIX to industry and universities using PDP computers, the publication of C in 1974 on DEC's $50,000 PDP-11 loaded with UNIX soon became the de facto standard distributed minicomputer for science and engineering use. The PDP-11 minicomputer could not only serve as a stand-alone, multipurpose computer, it could manage data and program files, control laboratory equipment, text-edit and data-format reports using a central kernel, and through a shell, time-share and program-share over multiple local and remote networks.
伯克利于 1977 年建立了基于 UNIX 的自由开放的“主机编码平台”,称为“伯克利软件发行版”(BSD),它是今天GitHub和Red Hat等对人工智能发展至关重要的平台的先驱。
Berkeley in 1977 established a free and open “host-computer coding platform” based on UNIX called the “Berkeley Software Distribution” (BSD), the harbinger of today's platforms such as GitHub and Red Hat that are now critical for the development of artificial intelligence.
1978年,DEC 推出速度更快、功能更强大、内存为 4.3GB 的 VAX 11/780,成为工业、大学和研究机构的主流分布式计算机。
In 1978, DEC's faster and more powerful VAX 11/780 with 4.3 gigabytes of memory took over as the mainstream distributed computer for industry, universities and research institutions.
由于任何装有 UNIX 操作系统和C编译器的计算机都可以使用 UNIX 系统,因此到 1983 年,超过 80% 的大学计算机科学系都采用了免费的 UNIX 计算机间通信系统。
Since any computer with a UNIX OS and a C compiler could employ the UNIX system, so that by 1983 more than 80% of university computer science departments had adopted the free UNIX inter-computer communications system.
然而同年,美国司法部与 AT&T 的反垄断同意协议被解除,UNIX 突然成为商业产品需求量很大,而且由于源代码只能按原样授权,在商业计算机世界里走了很多路之后,几乎每一站都产生了许多不同版本的 UNIX,每个版本都有自己的特点,以前的标准通信格式被分割开来,减少了 UNIX 的覆盖范围。UNIX 的高价商业化导致了 Richard Stallman 的免费 GNU(“GNU 不是 UNIX”)的出现,下一章将对此进行描述。
However in the same year, the Department of Justice's antitrust consent agreement with AT&T was lifted and UNIX suddenly became a commercial product in high demand, and because the source code was only licensed as is, after much travelling through the commercial computer world, almost every stop produced many different versions of UNIX, each with its own idiosyncrasies, the erstwhile standard communications formats were Balkanized decreasing UNIX's reach. The the high-price commercialization of UNIX led to Richard Stallman's free GNU (“GNU is not UNIX”) to be described in the next chapter.
早在 1971 年,雷·汤姆林森 (Ray Tomlinson)就使用 @ 符号作为地址发送了第一封电子邮件,1973 年,用于机构和公司内部计算机连接的局域网(LAN) 开始使用大容量以太网同轴电缆共享数据和信息。
Way back in 1971, Ray Tomlinson sent the first email using the @ symbol for the address, and in 1973, the local area network (LAN) for computer connections within institutions and companies began using high-capacity Ethernet coaxial cables for sharing data and information.
1980 年,在斯坦福大学,计算机科学研究生 Leonard Bosack 想与女友 Sandy Lerner 进行计算机通信,后者也是计算机科学系的毕业生,当时负责管理斯坦福商学院计算机实验室。他们曾跋涉穿过校园地下维护隧道,拉起电缆连接他们的计算机,因此他们想出了使用路由器和服务器为斯坦福不同部门的计算机建立 LAN 连接的想法,并以此为基础于 1984 年成立了思科公司,为计算机联网提供硬件和软件。6
At Stanford in 1980, Computer Science grad student Leonard Bosack wanted to computer-communicate with his girlfriend Sandy Lerner, also a Computer Science Department graduate, who was managing the Stanford Business School Computer Lab. They legendarily trudged through the underground cross-campus maintenance tunnels stringing cable to connect their computers, and so came up with the idea of routers and servers to make LAN connections for the different departmental computers at Stanford, the basis for starting a company in 1984 called Cisco to provide hardware and software for networking computers.6
为了进一步扩展广域网(WAN),客户端计算机可以按照 1983 年的传输控制协议/互联网协议(TCP/IP) 访问分组交换服务器,该协议如今规定了如何将消息分解成数据包、进行分类、传输到 IP 地址(当前为IPv6)以及如何使用域名系统(DNS) 进行传输并在目的地重新组装。
To further expand wide area networks (WANs), client computers could access the packet-switching servers in accord with the 1983 Transmission Control Protocol/Internet Protocol (TCP/IP) that today governs how messages are broken up into packets, sorted, transmitted to IP addresses (currently IPv6) and transmitted using the Domain Name System (DNS), to be reassembled at the destination.
在接下来的几年里,欧洲核子研究中心的蒂姆·伯纳斯·李 (Tim Berners Lee)于1989 年推出了万维网,它通过统一资源定位符(URL)来识别网站,并使用超文本标记语言(HTML)进行链接,以便所有新兴的网络浏览器都可以访问互联网上的所有网站。
In the next few years in quick succession, Tim Berners Lee at CERN launched the World Wide Web in 1989 that identified websites by Uniform Resource Locaters (URLs) that were linked using Hypertext Markup Language (HTML) so that all the nascent web browsers could access websites all over the Internet.
伊利诺伊大学的 Marc Andreessen 于 1993 年开发了Mosaic,这是第一个供公众使用的通用网络浏览器,并共同创立了Netscape于 1995 年发明;华盛顿大学的 Brian Pinkerton 于 1994 年发明了用于通用互联网搜索的WebCrawler ( Spiderbot ),同年斯坦福大学的杨致远与 David Filo 推出了第一个树形结构的分级 Web 目录Yahoo。
Marc Andreessen at the University of Illinois developed Mosaic, the first general purpose web browser for public use in 1993 and co-founded Netscape in 1995; Brian Pinkerton at the University of Washington in 1994 invented the WebCrawler (Spiderbot) for universal Internet searching, and in the same year, Jerry Yang and David Filo at Stanford unveiled the first tree-structure hierarchical Web directory Yahoo.
每个人都知道研究生谢尔盖·布林和拉里·佩奇在斯坦福大学进一步开发网络爬虫的故事,以及 1998 年搜索引擎的故事,该搜索引擎将公司名称“Google”变成了在线搜索的动词。数据采集的革命和逐渐积累的数据量,例如 Google ImageNet的 1400 多万张带标签的图像作为计算机视觉的 AI 训练集,将对人工神经网络自下而上学习的进展产生深远影响。
Everyone knows the story of graduate students Sergey Brin and Larry Page's further development of the web crawler also at Stanford, and the 1998 search engine that turned the company's name “Google” into a verb for online search. The revolution in data acquisition and the very volume of data that has been progressively accumulated, such as Google ImageNet's more than 14 million labeled images serving as AI training sets for computer vision, would have a profound influence on the progress of the bottom-up learning of artificial neural networks.
随着计算机数量呈指数级增长,所有数据和信息都需要更快、更可靠的更宽带宽传输。当然,早期通过电话线传输的 9600 波特(每秒比特)是不够的,而现在卫星通信、1000 Mbps 光纤网络、2-4 GHz 频率的地面无线微波传输是当今 4G 和 5G 电话和互联网数据和信息的主要载体。
All the data and information swirling about the exponentially-increasing number of computers required the faster and more reliable transmission of broader bandwidths. Of course the early transmission over telephone lines of 9600 Baud (bits per second) was inadequate, and now satellite communications, 1000 Mbps fibre optic networks, terrestrial wireless microwave transmission with its 2–4 GHz frequencies today are the main carriers of 4G and 5G telephony and Internet data and information.
西如果源代码和技术是专有的且保密的,那么广泛、快速和便捷的计算机通信对开发新想法和创新软件的帮助就很小。自由和公开地发布和使用开源软件(OSS)进行软件开发,甚至通过众包寻找新想法,特别是利用主机编码平台的免费访问,对于人工智能的发展至关重要。
Widespread, fast, and convenient computer communication would be of little help in developing new ideas and innovative software if their source code and techniques were proprietary and kept confidential. The free and open publication and use of open source software (OSS) for software development and even crowd sourcing for new ideas particularly using the free access of host-computer coding platforms are critical to the development of artificial intelligence.
开源软件精神诞生于麻省理工学院模型铁路俱乐部,在那里,一些复杂的铁路操作的成功设计是由某人“随意修改”电路直到模型火车按计划运行而完成的。当然,好的电路被纳入了宏伟的模型中,从而可供俱乐部的所有成员使用。
The open source software ethos was born at the MIT Tech Model Railroad Club where a successful design of some complex railroad operation was completed by someone who just “hacked away” at the circuits until the model train behaved as planned. The good circuits of course were incorporated into the grand model and thereby accessible to all the members of the club.
俱乐部的兴趣从旧的铁路领域自然而然地转向了新的计算机世界。这些自由奔放的年轻人非正式地传播着完全不受限制的思想,同时开发了在 PDP-1 小型计算机上运行的第一批基本的计算机游戏。
From the old realm of railroads, the Club's interests naturally turned to the new world of computers. The free-spirited young men informally promulgated a completely unrestricted flow of ideas while they developed the first rudimentary computer games running on a PDP-1 minicomputer.
黑客精神迅速传遍了整个国家,传到了帕洛阿尔托的自制计算机俱乐部,杂志《大众电子》将黑客福音传播到全国各地,让那些有数学/电子天赋的男孩们可以尝试新技术,最终不仅苹果公司,还有微软和许多其他先驱公司,都在计算机的这种幼稚但令人兴奋的氛围中诞生。自由、开放的思想和程序交流推动了硬件和软件的开发。
The hacker spirit quickly crossed the country to Palo Alto's Homebrew Computer Club, and the magazine Popular Electronics spread the Hacker Gospel to all points in-between where boys with a mathematical/electronics bent could try their hand at the new tech, so that not only Apple, but also Microsoft and many other pioneering companies, were eventually spawned in the juvenile but heady atmosphere of computer hardware and software development spurred on by the free and open exchange of ideas and programs.
最受欢迎的自制硬件是基于 Altair 8800 家用电脑套件,运行在 1975 年 1 月《大众电子》杂志封面上的 Intel 8080 CPU 上,软件最初使用 DOS 汇编语言操作系统,后来改为更流行的 CP/M。
The most popular home-build hardware was based on the Altair 8800 home computer kit running on an Intel 8080 CPU that was featured on the January 1975 cover of Popular Electronics, and the software initially used the DOS assembly language operating system, later changing over to the more popular CP/M.
但正是黑客追随者比尔·盖茨 (Bill Gates) 痛斥了 Homebrew 兄弟,指责他们复制并分发他价值 500 美元的 BASIC 编程系统而没有向他支付任何费用,这才是真正的决断。
But it was one of the hacker acolytes, Bill Gates, who crossed the Rubicon with his lambasting of the Homebrew boys for copying and distributing his $500 BASIC programing system without paying him for it.
个人电脑的发明者史蒂夫·沃兹尼亚克 (Steve Wozniak) 一直秉持黑客精神,将自己的电脑设计免费分发给他在 Homebrew 的同事,甚至任何感兴趣的人。但在沃兹尼亚克通过电子方式模仿 AT&T 的 2600 Hz 切换音调免费拨打长途电话,破解了 AT&T 非常昂贵的长途电话后,他向 17 岁的好友史蒂夫·乔布斯 (Steve Jobs) 展示了他的“蓝盒子”,乔布斯很早就展现了自己的营销头脑,他走访了大学宿舍,那里的学生经常打电话回家和给朋友打电话,他以每台 170 美元的价格卖出了 100 多个蓝盒子,为自己和沃兹赚了不少零花钱。1
While the inventor of the personal computer Steve Wozniak remained true to the hacker ethos in freely distributing his computer designs to his confreres at Homebrew and indeed to anyone interested. But after Wozniak hacked AT&T's very expensive long-distance calling by electronically mimicking AT&T's 2600 Hz switching tones for free long-distance calls, he demonstrated his “Blue Box” to his 17-year-old friend Steve Jobs, who early-on displayed his marketing acumen by going around to university dormitories where students made many calls home and to friends, selling more than 100 Blue Boxes for $170 each, earning substantial pocket change for himself and Woz.1
乔布斯将苹果电脑的发明归功于他们的蓝盒子,2
Jobs credits their blue box for the invention of the Apple computer,2
如果我们不制造蓝盒子,就不会有苹果。因为我们不仅没有信心制造出并使其发挥作用……而且我们也没有能够影响世界的魔力
If we wouldn’t have made blue boxes, there would have been no Apple. Because we would have not had not only [sic] confidence that we could build something and make it work … but we also had the sense of magic that we could influence the world
但蓝盒子明显侵犯了 AT&T 的技术和商业机密,沃兹和乔布斯幸运地躲过了警方和 FBI 的调查。然而,乔布斯和盖茨后来都求助于法律来严厉谴责抄袭者,却忘记了苹果电脑后来的许多功能都是抄袭的,是由斯坦福研究所 (今天的 SRI)、施乐帕洛阿尔托研究中心 (PARC) 和伯克利软件发行部 (BSD) 的其他人发明的,而微软 Windows 的许多功能都是从苹果的 Macintosh 上抄袭而来的。
But the Blue Box clearly infringed AT&T's technology and trade secrets and Woz and Jobs were fortunate to evade the police and an FBI investigation. Both Jobs and Gates however later turned to the law as helpmate in their excoriation of copyists, conveniently forgetting the fact that many of Apple computers’ later features were copied, being invented by others at Stanford Research Institute (today SRI), Xerox Palo Alto Research Center (PARC), and the Berkeley Software Distribution (BSD), and many of Microsoft Windows’ features were simply copied from Apple's Macintosh.
然而乔布斯的愿望是建立一家经久不衰的公司,创造创新型新产品并满足需求(其中一些甚至连消费者自己都不知道),从而提供新的效用,如果使用得当,将造福社会(例如,手机使叫车服务成为可能),但苹果似乎永远在法庭上指控非常简单的技术侵犯版权和专利(例如,iPhone 的曲面边缘和比例),以期赶走竞争对手。
Jobs’ desideratum however was to establish an enduring company that would create innovative new products and supply demands (some of which even consumers themselves were unaware), and thus provide new utility that would benefit society if properly used (for example, the mobile phone makes ride-hailing possible), but Apple seemed to be forever in court charging copyright and patent infringement of very simple technology (for example the curved iPhone edges and proportions) in an effort to drive out competitors.
盖茨缺乏创新精神,但更善于投机取巧,他只是利用 IBM 和 PC 克隆产品的近乎垄断地位,从操作系统和应用软件许可中获取巨额寻租利润。永无止境的Windows新版本反映了盖茨专注于一项商业战略,即利用消费者对新功能(不一定有用)和最新科技时尚产品的渴望。3
The less innovative but more opportunistic Gates was simply riding on IBM and the PC clones’ near monopoly to make exorbitant rent-seeking profits from his operating system and applications software licenses. The never-ending new Windows versions reflect Gates’ concentration on a commercial strategy that exploited consumers’ desire to have (not necessarily useful) new features and the latest tech-fashion products.3
盖茨凭借其个人电脑操作系统垄断积累了大量财富之后,他晚年精心选择了一些值得资助的事业,虽然这值得称赞,但在某些人看来,这不过是他对自己过去商业违规行为良心的忏悔,或许是为了购买一枚通往不应得的天堂的门票。
After amassing his fortune from his personal computer operating system monopoly, Gates’ late-blooming philanthropy for well-selected worthy causes, although laudable, appeared to some as conscience-stricken amends for past commercial transgressions, perhaps being purchases of wishful tickets to an undeserved Heaven.
微软表面上已经重塑为一家更加开放、更具社会责任感的公司,它公布并授权了神圣的操作系统源代码,并承诺在新的首席执行官萨蒂亚·纳德拉的领导下更加合作。但这个终极商业掠食者真的能改变自己的本性吗?
Microsoft ostensibly has been re-engineered as a more open and socially responsible company with the publication and licensing of its holy operating system source code and a promise to be more cooperative under direction of the new CEO Satya Nadella. But can the ultimate commercial predator really change its spots?
幸运的是,对于人工智能的发展来说,软件得到了开放源代码促进会(OSI)黑客精神重生的祝福,OSI 是一个提倡自由使用和公开发布的非盈利组织。通过主机编码平台分发源代码,用于参与新软件的开发。
Fortunately for the development of artificial intelligence, software has been blessed by the rebirth of the hacker spirit in the Open Source Initiative (OSI), a non-profit organization promoting free use and openly distributed source code for participation in new software development through host-computer coding platforms.
然而,企业开源却在 AT&T 这个知识产权的坚定保护者身上不合时宜地诞生。原因是,在 20 世纪 60 年代之前,AT&T 被司法部根据一项同意令指定为美国电话系统的官方批准垄断企业,以确保兼容的公用事业服务,事实上,几代人以来,笨重的黑色拨号电话是唯一可用的电话型号,而所有电话、交换机、中继站和固定电话都是由 AT&T 的制造子公司西部电气生产的。为了防止 AT&T 垄断地位进一步扩大,司法部禁止 AT&T 从事电话通信以外的任何活动,特别是在蓬勃发展的计算机新领域。
Corporate open source however was incongruously hatched at AT&T, a fierce protector of its intellectual property rights. The reason is that before the 1960s, AT&T was designated as an officially sanctioned monopoly of America's telephone system under a Consent Decree by the Department of Justice to ensure compatible public utility services, and indeed for generations the stodgy black dial phone was the only telephone model available, while all the phones, switchboards, relay stations, and land lines were produced by AT&T's manufacturing subsidiary Western Electric. To prevent the further expansion of AT&T's monopoly, the Justice Department prohibited AT&T from any activities outside of telephone communications, and particularly in the burgeoning new field of computers.
因此,在 20 世纪 70 年代,肯·汤普森在贝尔实验室开发的 UNIX 操作系统及其用于计算机间通信的衍生产品的源代码根据通用公共许可证(GPL) 免费分发给大学,商业使用费用为 20,000 美元。UNIX 迅速成为事实上的标准科学和工程通信链路,服务于大学、研究机构、技术公司和政府机构。
And so in the 1970s the source code for Ken Thompson's UNIX operating system and its derivatives for inter-computer communication developed at Bell Labs was freely distributed to universities under a General Public License (GPL), and for a nominal $20,000 for commercial use. UNIX quickly became the de facto standard science and engineering communication link serving universities, research institutions, technology companies, and government agencies.
然而,在 1983 年政府强制 AT&T 解散为地区性“小贝尔”公司,以及伟大的贝尔实验室惨遭解散之后,AT&T 终于摆脱了反垄断同意令,迅速回归商业本源,认为 UNIX 是专有的,使用 UNIX 需缴纳高达 25 万美元的巨额许可费。其源代码严格保密并受版权保护,副本的分发仅以目标代码形式进行,虽然不是人类可读的,但也受版权保护。4
However, after the government-mandated dissolution of AT&T in 1983 into regional “Baby Bells”, and the grievous dissolution of the great Bell Labs, at last free from the antitrust Consent Decree, AT&T quickly reverted to its commercial roots, deeming UNIX proprietary and its use subject to substantial licensing fees up to $250,000. Its source code was strictly confidential and protected by copyright, the distribution of copies were in object code only, and although not human-readable, also protected by copyright.4
因此,尽管 AT&T 催生了通用公共许可证的概念,但它却通过建立繁琐的专利使用费许可制度(微软等公司随后也采用了该制度)以及为侵权诉讼提供法律基础(苹果等公司将积极提起该诉讼)而终结了商业开源软件的概念。5
And so, although birthing the concept of a General Public License, AT&T turned around and terminated the idea of commercial open source software by establishing the onerous royalty-bearing licensing regime subsequently employed by, among others, Microsoft, and the legal basis for infringement suits that would be aggressively pursued by, in particular, Apple.5
AT&T UNIX 现已封闭的系统让快速发展的计算机科学学科的年轻成员感到不满,而 BSD 则创造了许多竞争性替代品,但特别是公认的计算机科学天才理查德·斯托曼 (Richard Stallman) 主动开发了一种竞争性的计算机间操作系统,他称之为“GNU”,这是一个递归缩写,即“GNU's Not UNIX”,其源代码是公开的、免费的,并且可以自由分发,并包含一个开发工具包,以鼓励参与开放操作系统及其实施和应用程序的进一步开发。
The AT&T UNIX now closed system rankled the young members of the fast-developing academic discipline of computer science, and BSD created many competitive alternatives, but in particular the acknowledged computer science genius Richard Stallman took it upon himself to develop a competing inter-computer operating system he called “GNU”, a recursive acronym, “GNU's Not UNIX”, the source code of which was public, free, and freely distributable, and included a developmental toolkit to encourage participation in further development of the open operating system and its implementation and applications.
斯托曼骑着 GNU 这匹开放的角马,穿过计算机科学系的大厅和全球黑客的家园,热情的斯托曼向所有人提供可自由使用和分发的开源软件,同时严厉批评苹果和微软拒绝公众获取源代码的行为,称其为真正的“反人类罪”,将扼杀新兴计算机软件学科及其相关产业的发展。
Mounted on the open range wildebeest GNU trampling through the halls of computer science departments and the homes of hackers worldwide, the wild-eyed Stallman offered open source software for all to freely use and distribute, all the while excoriating Apple and Microsoft's denials of public access to source code as veritable “crimes against humanity” that would stifle the growth of the new computer software discipline and its related industries.
不知所措的苹果和微软实际上用自己的垄断取代了 AT&T 和 IBM 的垄断,并且忙于通过繁重的许可计划(微软)和强有力的专利和软件版权执行(苹果)来限制个人计算机操作系统和应用软件的交易。
Apple and Microsoft, nonplussed, in effect replaced AT&T and IBM's monopolies with their own, and were busily restraining the trade for personal computer operating systems and applications software with onerous licensing programs (Microsoft) and vigorous enforcement of patents and software copyright (Apple).
讽刺的是,最早的宽松主机编码平台之一 BSD 于 1992 年被 AT&T 起诉侵犯 UNIX 版权(导致 BSD 的进一步开发推迟了两年),体现了微软和苹果对版权执行的热情,而微软后来在 Windows 2000 中玩世不恭地使用了 BSD 编写的代码,而苹果的 macOS 和 IOS 操作系统主要基于FreeBSD。看来,在计算机软件的商业化世界中,鱼与熊掌可以兼得。6
It is of utmost irony that one of the first permissive host-computer coding platforms, BSD, was sued by AT&T for copyright infringement of UNIX in 1992 (delaying BSD's further development for two years) exemplifying the copyright enforcement zeal of Microsoft and Apple, while Microsoft later cynically used BSD-produced code in Windows 2000 and Apple's macOS and IOS operating systems were largely based on FreeBSD. It seems that in the commercialization world of computer software one can have one's cake and eat it too.6
在与秘密和专有软件的斗争中,斯托曼的征战在遥远的芬兰的一次演讲中找到了志同道合的人。受到斯托曼演讲的启发,赫尔辛基大学研究生 Linus Torvalds 使用 GNU 工具包开发了同名的Linux,作为个人电脑的开源竞争操作系统。1991 年,他根据 GPL 在网上发布了源代码,吸引了许多年轻黑客,他们不仅将其用作微软操作系统的替代品,还邀请他们改进它并基于它开发新的实用程序和应用程序,这是众包的早期实例。
In his fight against secret and proprietary software, Stallman's crusade found a kindred spirit during a speech in far-away Finland. Inspired by his talk, the University of Helsinki grad student Linus Torvalds, using the GNU toolkit, developed the eponymous Linux as an open source competing operating system for personal computers. He published the source code online in 1991 under a GPL, attracting many young hackers to not only use it as a substitute for Microsoft's OS, but also inviting them to improve it and develop new utilities and application programs based on it in an early instance of crowdsourcing.
Linux 将计算机操作系统和应用软件的开发从微软秘密源代码的束缚中解放出来,为新兴科技公司提供了一个操作系统平台,让他们可以在此平台上开发自己的新软件。苹果仍然是封闭系统、高价专有硬件和软件的堡垒,尽管其产品有计划淘汰和营销花招,但凭借如今必不可少的新产品舞台表演,每款新产品都成为高科技时尚达人的宠儿。7
Linux liberated the development of computer operating systems and application software from the tyranny of Microsoft's secret source code, providing new tech start-ups an operating system platform on which to develop their own new software. Apple has remained a bastion of closed-system, overpriced proprietary hardware and software that despite its products’ planned obsolescence and marketing subterfuges, by means of his now de rigueur new product stage performances, each new product became the darling of the high-tech fashionista.7
尽管如此,当全世界都在哀悼这位科技偶像 2011 年的逝世时,据报道,当被问及他对开源信条的反应时,斯托曼说:“我并不高兴他去世,但我很高兴他走了。” 8
Notwithstanding, while the world mourned the death of the tech icon in 2011, when asked for his reaction, ever-true to his open source creed, Stallman was reported to have said, “I’m not happy that he died, but I am glad that he is gone”.8
事实是,苹果公司的史蒂夫·乔布斯和微软公司的比尔·盖茨的严酷商业专制与理查德·斯托曼和林纳斯·托瓦兹的梦幻理想主义之间的冲突,将在人工智能发展中粗俗的寻租行为和理想主义的自由开放软件之间的妥协中找到共同点。
It has come to pass that the conflict between the harsh commercial despotism of Apple's Steve Jobs and Microsoft's Bill Gates against the dreamy idealism of Richard Stallman and Linus Torvalds would find common ground in a compromise between crass rent-seeking and idealistic free and open software in the development of artificial intelligence.
可以理解的是,激励男孩自由炫耀其编程技能的名声可能不足以满足需要养家糊口的男人。该领域的创造力和辛勤工作无疑应该得到相应的报酬,但另一方面,该领域的信息和思想的自由和开放交流对其发展必不可少,而对于影响深远的人工智能科学和技术来说,这一点至关重要。
It can be understood that the notoriety that motivates a boy to freely show off his coding skills may not suffice for the man who has to feed a family. The creativity and hard work in his field no doubt deserve proportionate remuneration, but on the other hand, the free and open exchange of information and ideas in that field are necessary to its development, and now critically so for the far-reaching science and technology of artificial intelligence.
因此,加州大学的科技理想主义者们,遵循着伯克利的精髓,与开明工程的堡垒麻省理工学院 (MIT) 设计了 GPL 的衍生产品——宽容软件许可证(PSL),并建立了BSD和MIT 许可证制度,提倡新软件的开源开发,但出于对企业家精神的认可,允许利用新开发的商业软件,而不需要发布源代码。
So the tech idealists at the University of California, in conformance with the Berkeley quintessence, together with kindred spirits at the citadel of enlightened engineering MIT, devised a GPL derivative, the Permissive Software License (PSL), and established the BSD and MIT License regimes that advocated the open source development of new software, but with a nod to entrepreneurship, permitted the exploitation of newly-developed commercial software that did not require publication of the source code.
因此,一个可能带来丰厚利润的软件产品可以作为知识产权受到保护,从而奖励其创造者专有权利,但该产品的软件起源是开源的,任何人都可以免费使用。通过这种方式,商业化可以鼓励新想法,但实现想法所需的工具将是免费的。例如,C++、Java、Python语言,以及Wolfram、Matlab和Octave上提供的固定数值分析和线性代数程序,以及众多免费工具、工具包和应用程序以及在线编程教程和主机源代码平台(如RedHat和Github)都可用于帮助指导和实现新产品。这种发展制度对新人工智能算法的创造起到了重要作用。9
So a potentially lucrative software product could be protected as intellectual property, rewarding its creator with exclusive rights, but the software genesis of that product was open source, free for anyone to use. In this way, commercialization could encourage new ideas, but the tools necessary for manifesting the idea would be free. For example, the languages C++, Java, Python, and canned numerical analysis and linear algebra programs such as available on Wolfram, Matlab and Octave, plus numerous free tools, toolkits, and applications programs together with online programming tutorials and host-computer source code platforms such as RedHat and Github are all available to help guide and realize the new product. This type of a developmental regime was instrumental in the creation of new artificial intelligence algorithms.9
例如,在抗击新冠病毒及其变体的斗争中,哈佛医学院蛋白质折叠算法源代码在 GitHub 上公开;百度的Linearfold mRNA 蛋白质折叠算法、华盛顿大学(圣路易斯)Folding@Home开源个人计算机链接以百亿亿次浮点运算(10 18次操作)的速度运行,用于众包蛋白质折叠开发;IBM、谷歌、亚马逊和微软都已提供其计算机能力来研究针对新冠病毒的疫苗。英特尔与阿贡国家实验室合作的Aurora超级计算机项目也将参与蛋白质折叠计算,以寻找用于治疗和疫苗的抗病毒药物。
For example, in the fight against the corona virus and its variants, the Harvard Medical School protein folding algorithm source code was publicly available on GitHub; Baidu's Linearfold mRNA protein folding algorithm, the Washington University (St. Louis) Folding@Home open source personal computers link that ran at exaflop (1018 ops) capacity for crowd-sourced protein folding development, and IBM, Google, Amazon, and Microsoft have all made their computer power available to research vaccines against the corona virus. The Intel-Argonne National Laboratory cooperative Aurora supercomputer project also will take part in protein folding calculations to find anti-viral drugs for treatment and vaccines.
疫苗研发和生产专利权豁免的提议也是开源趋势的一个指标。在疫情肆虐的全球背景下,世界正在向开源迈进,这是令人欣慰的;希望疫情消退后,开源仍是人工智能发展的基础。
The proposal of the waiving of patent rights for vaccine development and manufacture is also an indicator of the trend towards open source. It is gratifying that the world is progressing to open source in the face of an overwhelming global pandemic; let us hope that open source remains the basis for the development of artificial intelligence after the pandemic withers away.
我1956 年,在达特茅斯学院首次举办的人工智能夏季研究项目中,人工智能先驱约翰·麦卡锡和马文·明斯基在一份拨款提案中提出了他们的目标是开发纯粹自上而下的人工智能机器,1
In 1956, at the inaugural Dartmouth Summer Research Project on Artificial Intelligence, the AI pioneers John McCarthy and Marvin Minsky set out their goal of a pure top-down artificial intelligence machine in a grant proposal,1
这项研究基于这样的猜想:学习的每一个方面或智力的任何其他特征原则上都可以被如此精确地描述,以至于机器可以模拟它
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it
他们的猜想在哲学上基于笛卡尔心身问题的一元论;也就是说,抽象思维和物理大脑是不可分离的,因此思维可以由物理机器产生。2
Their conjecture was philosophically based on the monism of Descartes’ mind-body problem; that is, an abstract thought and the physical brain are not separable, and thus a thought could be produced by a physical machine.2
两位与会者,艾伦·纽厄尔 (Allen Newell) 和赫伯特·西蒙 (Herbert Simon),选择了一项他们认为可以最终证明该猜想的任务:一台可以证明数学定理的机器。3
Two attendees at the conference, Allen Newell and Herbert Simon, picked a task that they believed would be an ultimate proof of the conjecture: a machine that could prove mathematical theorems.3
根据阿尔弗雷德·诺思·怀特黑德和伯特兰·罗素于 1913 年在其划时代的三卷本著作《数学原理》中提出的符号逻辑规则,使用一组符号来表示同义反复 比如“所有未婚男人都是单身汉”, “如果A意味着B,且B意味着C,则A意味着C ”这样的推论,以及唯一性的数学概念()和完整性(对于每一个也确实如此)。
From the rules of symbolic logic as set forth in 1913 by Alfred North Whitehead and Bertrand Russell in their epochal three-volume tome Principia Mathematica, a set of symbols is used to represent tautologies such as “All unmarried men are bachelors”, inferences such as “if A implies B, and B implies C, then A implies C”, and the mathematical concepts of uniqueness () and completeness (for every it is also true that ).
逻辑理论家使用创新的信息处理语言(IPL)(列表处理器(LISP)人工智能语言的前身)将单词和短语转换为二进制编码的符号,并通过在树搜索中比较和匹配符号串,其中根是假设,每个分支都是基于符号逻辑的推论,目标是树的一个树枝;证明是从根到分支再到树枝的轨迹。
The Logic Theorist, using the innovative Information Processing Language (IPL), a forerunner of the List Processor (LISP) artificial intelligence language) converted words and phrases into binary-coded symbols, and by comparing and matching symbol strings in a tree search where the root is the hypothesis, each branch is a deduction founded on symbolic logic, and the objective is at a twig of the tree; the proof is the trajectory from root through branch to twig.
在达到目标的过程中,人们可以做出许多不同的选择和组合。人们可以逐一尝试所有可能性,检查结果,如果需要,可以回溯尝试下一个可能性。即使是对于一些数学基本定理的证明和棋盘游戏等受限领域的问题,这种穷举搜索方法也会遇到需要当今超级计算机来计算的组合可能性爆炸。
There are many different choices and combinations that one can take along the way to reach the goal. One can try all the possibilities one-by-one, checking the result, and if wanting, backtrack to try the next possibility. Even for restricted domain problems such as the proof of some basic theorems of mathematics and the playing of board games, this exhaustive search method would encounter a combinatorial explosion of possibilities requiring today's supercomputers to compute.
优秀的数学家当然不会仅仅通过反复试验的穷举搜索来做数学题,人类的国际象棋和围棋选手也不会以如此平淡无奇的方式下棋,而是利用知识、逻辑和启发式方法来减少选择数量,制定策略并积累经验。
Good mathematicians of course do not do mathematics solely by trial-and-error exhaustive search, and human chess and Go players also do not play in such a pedestrian manner, rather they use knowledge, logic, and heuristics to reduce the number of choices, formulate strategies, and gain experience.
自上而下的专家系统支持者的想法是,对基本操作进行硬连线,收集最佳实践者的知识和启发式方法,添加方法-目的分析以反馈流程是否在正确轨道上的信息,然后从期望的结果开始逆向工作,选择最佳的成功路径。
The top-down proponents’ idea for expert systems was to hard-wire the fundamental operations, collect the knowledge and heuristics of the best practitioners, add means-end analysis to feedback information that a process was on the right track, and then working backwards from the desired result, pick the best paths to success.
总之,这被称为爬山,因为登山者向上而不是向下的步伐通常更有可能有助于实现到达山顶的目标。然而,反映任何努力的复杂性,向下的步伐可能会导致无法预见的较低水平的绳索通道到达山顶,因此爬山轨迹方法显然不是绝对的;数学中的例子可能是矛盾证明(tertium non datur),游戏中的例子可能是国际象棋中的后牺牲以及围棋中的手拔和先手。4
Altogether this was called hill-climbing because a mountain-climber's step up rather than down is generally more likely to contribute to the goal of reaching the summit. However, reflecting the complexity of any endeavor, a step down might lead to an unforeseen lower level roped passage to the top, so the hill-climbing trajectory approach is obviously not absolute; examples in mathematics might be proof by contradiction (tertium non datur), and in games, queen sacrifice in chess and tenuki and sente in Go.4
逻辑理论家成功地证明了《自然哲学的数学原理》定理,甚至证明了与《自然哲学的数学原理》证明本身不同的方式,这让伯特兰·罗素本人感到高兴。
The Logic Theorist succeeded in proving Principia theorems, even in ways different from the Principia proofs themselves, delighting Bertrand Russell himself.
从这一成功经验来看,自上而下的专家系统可以将大问题分解为多个较小的子问题,然后按照“如果-那么”程序步骤依次解决每个子问题,并将所有子问题集成到一个核心推理引擎中,原则上该引擎可以解决任何问题。
From this success, top-down expert systems could break a big problem into smaller sub-problems, tackling each in turn by if-then program steps, and integrating all the sub-problems into a core inference engine that in principle could solve any problem.
例如,如果问题是赢得一场美式足球比赛,那么每一系列进攻都是一个子问题,传统训练的主教练会推理:如果你在球场的另一端,如果是第四次进攻并且还剩 8 码,那么就弃踢。然而,比赛策略和人员可能会决定其他情况;例如,你有一个好的假弃踢战术,一个可以跑动或传球的弃踢手,以及一个被认为保守的教练,这会导致意外。专家系统可以客观地考虑其他选择,尤其是因为它可以忽略未经证实的教练说教并接受风险假设(短期内输掉比赛,长期内被解雇)。
For example, if the problem is to win an American football match, each series of downs is a sub-problem, a traditionally-trained head coach would reason: if you are in your end of the field, and if it is fourth down and 8 yards to go, then punt. However, game strategy and personnel might dictate otherwise; for instance, you have a good fake-punt play, a punter who can run or pass, and a coach whose perceived conservatism would result in surprise. The expert system could objectively take into consideration other options not the least because it can disregard unsubstantiated coaching homilies and accept the assumption of risk (short-time losing the game, and long-term getting fired).
成功解决融入整个比赛的子问题将更有可能取得胜利,因此专家系统指导推理引擎由于其逻辑组织的方法,应该比人类主教练表现更好。
Success in tackling the sub-problems integrated into the total match will more likely result in victory, so an expert system coaching inference engine, due to its logically-organized approach, should perform better than a human head coach.
自上而下的专家系统的最终测试是深蓝。棋局的每个状态都呈现出一种特定的棋子排列,玩家面对的是一组必须评估的合法棋步的静态数组。评估函数用于判断某一步是否可能对玩家有利;例如,某一步是否捕获了对手的威胁性重棋子。使用alpha-beta剪枝的极小极大算法来减少分支数量,并且树搜索的逐步深化通过在树中较高(较早)的级别做出决策来加快评估速度。
The ultimate test of the top-down expert system was Deep Blue. Each state of a chess game presents a particular arrangement of the pieces with a player faced with a static array of of legal chess moves that must be evaluated. An evaluation function is used to see if a particular move is probably beneficial to the player; for example, if a move captures an opponent's threatening heavy piece. A minimax algorithm with alpha-beta pruning to reduce the number of branches, and a progressive deepening of the tree search speeds up the evaluations by making decisions at higher (earlier) levels in the tree.
Deep Blue 的极小极大算法使用了一种决策树的数据结构,该决策树具有一个位于顶部的根节点,以及从该节点向下延伸的子节点,分支因子 b是连接到该节点的子节点的数量,树的深度d大致是从根节点到最远的叶节点的决策节点数。
Deep Blue's minimax used a data structure decision tree having nodes with a root node at the top, and children descending from the nodes, and a branching factor b that is the number of children connected to the node, and the depth of the tree d is roughly the number of nodes of decision-making from the root down to the farthest leaf node.
在典型的国际象棋游戏中,深度为d = 100,如果b = 3(不是一棵很深的树),则叶节点(决策)的数量为b d = 3 100,这意味着根据数学家 Claude Shannon 的估计,一场游戏中将进行大约 10 120 次静态评估。七级层数越多,玩家就越难成为普通玩家,而成为大师则至少需要 15 层。
In a typical chess game, a depth of d = 100 and if b = 3 (not a very deep tree), the number of leaf nodes (decisions) is given by bd = 3100 which means that as estimated by the mathematician Claude Shannon there will be about 10120 static evaluations to be made in a game. A level of seven layers makes a mediocre player and at least 15 layers are required for a Grandmaster.
即使以每秒 2 亿次评估的速度,穷举搜索也是不可行的,因此搜索树必须通过上下极小化和并行处理计算进行alpha-beta剪枝,并且能够使用扩展的爆炸搜索执行不均匀的树开发来处理特殊遭遇,例如王车易位或脆弱的皇后。
Even at a rate of 200 million/second evaluations, exhaustive searches would not be feasible, so the search tree must be alpha-beta pruned with up-and-down minimaxing with the calculations parallel-processed, and capable of performing uneven tree development using extended blow-up searches to take care of special encounters, such as castling or a vulnerable queen.
Alpha-beta 剪枝是一种基于极小极大理论的方法,通过从树的根部到树枝再到树叶的上下搜索来消除无效路线,以找到到达目标节点的最佳路线,这样可以消除(剪枝)效率较低的路线,从而减少组合可能性。
Alpha-beta pruning is a method based on the minimax theory of eliminating unproductive routes by going up and down from the root through branches to leaves in a tree search to find the optimum route for reaching the goal node, and as such the less effective routes can be eliminated (pruned), and thus reduces the combinatorial possibilities.
Alpha是最大化器已探索的最佳路线,而beta是最小化器已探索的最佳路线,可能的路线如下图所示的简化搜索树所示。
Alpha is the best already-explored route for the maximizer and beta is the best already-explored route for the minimizer, the possible routes are shown in the simplified search tree shown in the figure below.
从路线的最坏情况值开始,对于最大化器来说,它是 −∞,对于最小化器来说,它是 +∞。一开始,除了搜索树的叶节点之外没有初始值,这些叶节点在进行任何新移动之前评估最终位置的“优劣”。这些叶节点静态评估可以有多种选择,国际象棋的一个简单示例可能是赢得对手的重棋子,这样玩家的重棋子在游戏的那一刻就会更大。
Starting with the worst case values for a route, for the maximizer it is −∞ and for the minimizer it is +∞. At the outset, there are no initial values except at the leaf nodes of the search tree which evaluate the “goodness” of a final position before any new moves are made. These leaf node static evaluations can be variously chosen, a simple example for chess could be winning an opponent's heavy piece such that a player's heavy pieces will be greater at that point in the game.
从搜索树的根开始,玩家沿着分支逐个比较值,这些值最初总是 +/–∞,不允许进行任何评估,直到到达叶节点,此时可以进行比较。然后逐个向后穿过所有分支回到根,当有一个更好的值(对于最大化者来说更正,对于最小化者来说负值较小)时,该分支将是最佳选择,所有较差的分支选择都可以丢弃。搜索树的这种“修剪”揭示了“噪音”最少的路线,所有其他分支都可以丢弃。5
Starting from the root of the search tree, a player goes along the branches one-by-one comparing values, which at first are always +/–∞ allowing no evaluations until a leaf node is reached, at which point a comparison can be made. Then going backwards through all the branches one-by-one back towards the root, when there is a value that is better (more positive for the maximizer and less negative for the minimizer), that branch will be the best choice and all the worse branch choices can be discarded. This “pruning” of the search tree reveals the routes with the least “noise”, and all the other branches may be discarded.5
尽管被称为“逐步深化”,但迭代深化深度优先搜索(IDDFS) 在实践中会通过对可能有益的分支进行早期迭代(例如因为它是基于某些国际象棋启发式方法的好举动)来减少最终的树搜索深度,直到在该分支中找到一个好的节点。
Despite being called “progressive deepening” the iterative deepening depth-first search (IDDFS) in practice reduces the ultimate tree search depth by earlier iteration of a probably beneficial branch (for example because it is a good move based on some chess heuristic) until a good node is found in that branch.
深蓝通过使用充当评估器的硬连线移动生成器进行树搜索,根据大师对已发布比赛中的叶节点值的确定进行 alpha-beta 剪枝。开局书和残局由深蓝的大师顾问提供并存储以供随机访问。
Deep Blue played by doing the tree search using hardwired move generators acting as evaluators that alpha-beta pruned based on a Grandmaster's determination of leaf node values in published matches. Opening books and endgames were provided by Deep Blue's Grandmaster advisors and stored for random access.
评估函数的一个例子是众所周知的策略,即车和象可以有利地放置在队列上或队列附近,这样可以通过移除队列上的兵(通常通过吃掉兵)来提前“打开队列”,从而对对手的重棋施加远程和长期压力。当然,不能过早打开队列,让对手注意到并进行干扰或挑战。相反,车和象应该设置在“可能打开”的队列上,玩家应该等待合适的时机打开队列。
An example of an evaluator function is the well-known tactic that rooks and bishops could be advantageously placed on or near files where there is an option of early “opening up the files” by removing the pawns on the files (typically by pawn capture) and thereby exerting long-range and long-term pressure on the opponent's heavy pieces. The file of course must not be opened prematurely allowing the opponent to notice and disrupt or challenge. The rooks and bishops rather should be set up on a “potentially open” file and the player should wait for an opportune moment to open the file.
至于整体战略和游戏中的反应,必须识别棋盘模式。一项以冠军棋手为受试者的实验发现,他们几乎可以立即记住合理的棋盘布局,但无法记住不合理的布局,因此得出结论,职业棋手是通过对棋盘布局的模式识别来下棋的棋盘上的棋子,通过经验、启发法和技巧,能够对逻辑棋盘做出合理的反应。比赛中棋手一脸困惑,这肯定表明棋盘的布局不合理,违背了他们的经验和/或逻辑。对于计算机来说,它的程序可能会被“无法识别”的模式弄得混乱,从而导致一些奇怪的反应。
As for overall strategy and in-game responses, board patterns must be recognized. An experiment of champion chess players as subjects found that they could almost immediately memorize rational board layouts, but could not memorize irrational layouts, so it was concluded that professional chess players play by the pattern recognition of the disposition of the pieces on a board, and through experience, heuristics, and skill, can rationally respond to a logical board. A face screwed-up in confusion during a match is a sure sign of an irrational board disposition contrary to their experience and/or logic. For the computer, its program may be discombobulated by an “unrecognized” pattern, causing some weird responses.
例如,当深蓝的前身深思(Deep Thought) ——第一台在正式比赛中击败国际象棋大师的计算机——面对右图所示的不太可能出现的假设棋盘布局时,人类棋手会清楚地看到黑棋拥有两个车和一个象的棋子,具有巨大优势,而白棋只有王和兵,但这些兵排列成一条严密的防线。只需将白王移到兵线后面,就可以轻松打成平局。但因为这样可以获得物质利益,又没有直接的人身威胁,深思吃掉了黑车和兵,从而摧毁了兵的防线,最终失败是不可避免的。6
For example, when Deep Blue's predecessor, Deep Thought, the first computer to defeat a Grandmaster in a sanctioned game, was presented with the admittedly not likely to encounter hypothetical board layout shown in the figure at right, it would be clear to a human player that black has an enormous advantage in pieces with two rooks and a bishop, while white has only a king and pawns, but those pawns are arrayed in an airtight defensive line. A draw could easily be achieved by just moving the white king around behind the line of pawns. But because it was available for material gain with no immediate personal threat, Deep Thought took the black rook with its pawn, and in doing so destroyed the line of pawns defense, resulting in an inevitable defeat.6
即使是普通人类棋手,看到棋盘上的图案后也不会犯这样的错误,但深思显然没有意识到这种不寻常的棋盘图案,而是自上而下地被编程来吃掉棋子并试图将死黑王。深思做了它该做的事,而专家系统在这种情况下只不过是一个笨手笨脚的业余爱好者,这证明了虽然“专家”也可能很愚蠢。
Even a pedestrian human player, having seen the pattern on the board, would not have made that mistake, but Deep Thought apparently did not recognize the unusual board pattern and was top-down programmed to capture pieces and attempt to checkmate black's king. Deep Thought did what it did, and the expert system in this case was no more than a bungling amateur, proving that although “expert” it could also be stupid.
深蓝拥有更大的内存、更快的处理器和更多的下棋启发式方法,可以通过编程避免“深思”的天真,但是,不寻常或非理性的玩法可能会混淆计算机的想法逐渐萌芽,卡斯帕罗夫就相信这是战胜机器的关键。
Deep Blue, with more memory, faster processors, and more chess-playing heuristics, could avoid Deep Thought's naïveté by programming, but the idea that unusual or irrational plays could confuse the computer germinated such that Kasparov for one, believed that this was the key to victory over a machine.
和所有伟大的冠军一样,卡斯帕罗夫在比赛前刻苦研究对手,以发现对手的下棋倾向和弱点。从与其他国际象棋计算机的赛前训练中,卡斯帕罗夫显然意识到由于计算机对好棋步的搜索非常彻底,直接的比赛很难取胜,因此设计了一种“反计算机”策略,故意走次优步,让深蓝无法识别,就像深思和奇怪的兵线防御一样,他认为这是国际象棋计算机可以利用的弱点。
Like all great champions, Kasparov assiduously studied his opponent before-match to find playing proclivities and weaknesses. From his pre-match training with other chess computers, Kasparov evidently realized that because of the sheer thoroughness of search for good moves that a computer can bring to bear, a straight-up match would be difficult to win, and so devised an “anti-computer” strategy of deliberately playing suboptimal moves that Deep Blue would not recognize, just like Deep Thought and the strange pawn-line defense, he thought that that was the chess computer's weakness that could be exploited.
因此,在第一局中,即使白棋占优,卡斯帕罗夫也放弃了常规开局,即使深蓝的黑象潜在对角线明显(如右图所示),他的重棋也从未离开过自己的半盘。前九步棋,棋子都处于封闭状态,位置高度紧张,但在令人不安的10.e3步中,卡斯帕罗夫证实了他次优的后退策略,因为正常步数是10.e4,可以交换兵。
Therefore in Game 1 even with white advantage, Kasparov abandoned normal openings, and his heavy pieces never left his own half of the board even when Deep Blue's black bishop potential diagonal file was evident as shown in the figure at right. For the first nine moves, the pieces were closed and highly positional, but in a disconcerting 10.e3 move, Kasparov confirmed his suboptimal laying-back strategy, as a normal move would be a 10.e4 offering a pawn exchange.
然而,深蓝并没有利用这个次优的着法来获得主动权,因为在这个例子中,它的评估函数的自动调整增加了某些着法的权重,但在极端情况下,权重达到了最大,这种饱和意味着深蓝不再区分非常糟糕的局面和更糟糕的局面。卡斯帕罗夫相信他的策略是有效的。
Deep Blue, however, did not capitalize on this suboptimal move to gain the initiative, because in this instance, its automatic tuning of the evaluation function had increased the weighting for certain types of moves, but in extreme cases the maximum weight was reached, and this saturation meant that Deep Blue no longer distinguished a very bad position from an even worse position. Kasparov believed his strategy was working.
饱和漏洞后来被发现和纠正,但经过几次有问题的举动后,深蓝输掉了第一局,但知道自己为什么会输,而卡斯帕罗夫在开场的小规模战斗中只获得了白棋的预期胜利,但在获胜的过程中,他对次优退棋策略的信念得到了加强。
The saturation bug was found and corrected later, but after a few more questionable moves, Deep Blue lost Game 1, but knew why it lost, and Kasparov only gained an expected white victory in the opening skirmish, but in winning, his belief in his suboptimal move lay-back strategy was reinforced.
在第二局比赛中,深蓝积极应对卡斯帕罗夫的后撤,利用车和象可能打开的线和对角线将卡斯帕罗夫的重棋子压制在后排,从而取得进攻并最终获胜,卡斯帕罗夫的反计算机策略因此土崩瓦解。鉴于他在第一局比赛中取得的成功,卡斯帕罗夫认为,如果没有人类操作员识别出次优的走法并在比赛中改变战术,深蓝无法适应并反击他的策略。他在比赛结束后抱怨道:
In Game 2, Kasparov's anti-computer strategy fell apart when Deep Blue, affirmatively responding to the lay-back, pinned down Kasparov's heavy pieces to his back rank utilizing rook and bishop potentially open files and diagonals, taking the offensive and final victory. In light of his success in Game 1, Kasparov did not believe that Deep Blue could adapt and counter his strategy without human operators recognizing the sub-optimal moves and changing tactics in-game. He complained after the match,
你知道,(反计算机策略)曾经发挥作用,但突然间它就失效了——深蓝突然找到了一种方法来打破兵链,并在非常非常方便的情况下开始对抗。
You know, [the anti-computer strategy] was working, but suddenly it stopped working – suddenly Deep Blue found a way just to break the pawn chains and start a confrontation in a very, very convenient situation.
深蓝在第二局的最后一步棋Ra6被评论员视为“非常可疑但非常出色的一步棋”,这足以令人震惊地以深蓝的满分结束了这场比赛。卡斯帕罗夫的怀疑和作弊指控无疑影响了他的发挥,并且可能是他在第二局中错失实质性平局机会的因素,正如他的助手在赛后分析中所指出的那样,他的黑棋因此错失了获得关键半分的机会。
Deep Blue's last move in Game 2, Ra6, deemed a “highly dubious but excellent move” by commentators was sufficiently shocking to end the game with a full point for Deep Blue. Kasparov's suspicions and accusation of cheating no doubt affected his playing, and were likely factors in his missing a substantive chance for a draw in that Game 2 as pointed out in after-game analysis by his seconds, and his black thus had missed a chance for a critical half-point.
卡斯帕罗夫在第三局中固执地继续他的反计算机策略,即使白棋也是如此。一开始,他的1.d3可能从未在最高级别的国际象棋比赛中被用作开局。在等待深蓝像第一局那样犯错之后,卡斯帕罗夫让深蓝建立了坚不可摧的防御,如右图所示,卡斯帕罗夫提出平局,失去了白棋优势和半分,他的1.d3棋子在第三局结束时仍然孤零零地坐在那里。
Kasparov stubbornly continued his lay-back anti-computer strategy in Game 3, even with white. At the very outset, his 1.d3 was a move that likely had never before been used at the highest levels of chess as an opening. After presumably waiting for Deep Blue to blunder as it had in Game 1, Kasparov allowed Deep Blue to set up an impenetrable defense as shown in the figure at right, and Kasparov offered a draw, losing his white advantage and a half-point, his 1.d3 pawn forlornly still sitting there at the end of Game 3.
在第四局中,在修复了第一局中归因于评估软件错误的奇怪举动之后,在卡斯帕罗夫的第43步棋时,“深蓝”由于一段监控并行搜索效率的代码而自行终止,如果效率降至给定水平以下,该代码就会终止程序,而事实确实如此,导致“深蓝”关闭,这让操作员许非常懊恼。
In Game 4, after the strange moves in Game 1 attributed to evaluation software bugs were repaired, upon Kasparov's 43rd move, Deep Blue self-terminated owing to a piece of code that monitored the efficiency of a parallel search and terminated the program if the efficiency dropped below a given level, which it did, causing Deep Blue to shut down, much to the operator Hsu's chagrin.
深蓝必须重启,而根据人机对弈规则,重启时间由计算机承担。在时间压力下,深蓝的后续棋步并非最优,第56步之后,深蓝残局 ROM显示出车残局和棋,卡斯帕罗夫也看到了,于是他随机出手黑棋。由于计算机编码异常,深蓝在白棋占优的情况下只得了半分。
Deep Blue had to be rebooted, and according to human vs. computer chess match protocol, the time it took to restart was charged to the computer side. Under time pressure, its subsequent moves were not optimal, and after the 56th move, Deep Blue's endings ROM indicated a rook ending draw that Kasparov also saw, and playing black, he expediently offered. Because of the computer coding anomaly, Deep Blue thus only gained a half-point with the white advantage.
第五局,深蓝在棋子发展上占有明显优势,并下了颇具争议的11.h5,将黑线h兵从原位置往前推了两格(如右图所示),这一步让特级顾问们大吃一惊。然而,对许来说,这一步意味着深蓝在警告这位有史以来最伟大的棋手:“如果你在王翼易位,我就会攻击你!”这是一个大胆的威胁,没有一个人类棋手敢对卡斯帕罗夫发出这样的威胁。
In Game 5, Deep Blue had a significant advantage in piece development, and made the controversial 11.h5 move pushing the black h-file pawn forward two squares down from its original position as shown in the figure at right, a move that startled its Grandmaster advisors. However, the move meant to Hsu that Deep Blue was warning the greatest player who ever lived, “if you castle kingside, I will attack you!” It was an audacious threat that no human player would ever dare level at Kasparov.
事实上,卡斯帕罗夫赛后感叹,“没有电脑能下h5!”但许很清楚深蓝的硬件在做什么,7
Indeed, Kasparov post-match lamented, “no computer plays h5!” But Hsu knew exactly what Deep Blue's hardware was doing,7
当我看到深蓝的h5步时,我清楚地知道是什么硬件评估功能促成了这一步。在复赛前最后两个月的芯片设计中,我对硬件进行了重大修改,以评估王的安全度。在王易位之前,硬件会计算三组王的安全度评估,一组用于王翼易位,一组用于后翼易位,一组用于留在中心。真正的王安全度评估是三者的加权线性组合,权重基于三者的相对排名以及易位的难度。……在游戏位置上,深蓝总是可以安全地易位后翼,因此从它的角度来看,h5步是完全可以胜任的。
When I saw the move h5 from Deep Blue, I knew precisely what hardware evaluation features prompted the move. During the last two months of chip design before the rematch, I added drastic changes to the hardware for king safety evaluation. Before the king castles, the hardware computes three sets of king safety evaluations, one for kingside castling, one for queenside castling, and one for staying in the center. The real king safety evaluation is the weighted linear combination of the three, with the weighting based on the relative ranking of the three, and difficulty of making the castling moves. … In the game position, Deep Blue could always castle queenside safely, and therefore move h5 was perfectly capable from its point of view.
尽管卡斯帕罗夫后来已经准备好升级一个通路兵,但深蓝只是将其王向前推进并启动基于重复检查的平局序列,因此第五局比赛打成平局,卡斯帕罗夫的白棋再次只挽回了半分。
Although Kasparov later had a passed pawn ready to promote, Deep Blue just marched its king forward and initiated a drawing sequence based on repetition checks, so Game 5 was drawn, with Kasparov's white once again only salvaging a half-point.
五局过后,比分打成 2.5-2.5 平,而到了最后的第六局,除非执黑的卡斯帕罗夫能够击败拥有白棋优势的深蓝,否则计算机将追平或击败卫冕世界冠军,从而创造历史。
After five games, the score was tied 2½-2½ and for the final Game 6, unless Kasparov playing black could defeat a Deep Blue with white advantage, history would be made with a computer tying or beating a reigning world champion.
第六局之前,解说员们几乎都认为,一旦深蓝掌握了主动权,就无法阻挡了。事实上,深蓝如下图左所示,深蓝以11.Bf4开局,并且看到三个棋子的位置补偿,显然处于进攻模式。深蓝放弃了物质利益,以最终的王杀为目标,继续进攻,在19.c4时,卡斯帕罗夫认输。历史在下图右所示的最终棋局中被创造。8
Before Game 6, the commentators now almost all believed that once Deep Blue had the initiative, it could not be stopped. Indeed, Deep Blue was out of its opening book with 11.Bf4 as shown in the figure below at left, and seeing three pawns worth of positional compensation, it was clearly in attack mode. Spurning material gains for an ultimate king kill, Deep Blue continued its attack and at 19.c4, Kasparov resigned. History was made with the final board shown in the figure below at right.8
尽管在第六局和比赛中都明显落败,但好斗的卡斯帕罗夫拒绝承认深蓝的优势,仍然认为 IBM 团队作弊了。他之前曾要求在比赛中查看深蓝的比赛日志,但 IBM 拒绝了,理由非常合理,因为这相当于在比赛过程中泄露比赛策略,就像人类在比赛中告诉对手他的策略和战术一样。IBM 同意在比赛结束后提供完整的比赛日志,以证明比赛中没有人工干预。
Despite a clear loss in Game 6 and the Match, a combative Kasparov refused to acknowledge Deep Blue's superiority, still believing that the IBM team had cheated. He had previously demanded to see Deep Blue's game logs during the Match, but IBM refused on the entirely reasonable grounds that that would be tantamount to revealing match strategy while the match was on-going, akin to a human telling his opponent his strategy and tactics during a match. IBM did agree to provide complete game logs after the Match to show that there was no in-game human intervention.
双方讨论了重赛,但卡斯帕罗夫要求他和他的团队提供更多奇怪的额外福利,并提议进行为期三周的比赛,中间休息两三天(显然是为了让卡斯帕罗夫恢复体力,而深蓝不需要休息)。9
A rematch was discussed, but Kasparov's demands of further bizarre perquisites for himself and his team, and his proposed three-week match with two- and three-day rest periods (obviously for Kasparov to physically recoup, Deep Blue needed no rest periods).9
备受追捧的赞助商认为,这样的复赛形式太长,无法吸引公众的注意力,因此不愿提供赞助。IBM 为这场比赛提供了 70 万美元/30 万美元的胜者/败者奖金,支付了纽约最豪华的摩天大楼酒店的巨额费用,并为卡斯帕罗夫及其随行人员提供了一切便利设施,但已经实现了其目标,因此并不热衷。
Sought-after sponsors felt that such a rematch format was too long to hold public attention and were not forthcoming. IBM, having put up the $700,000/$300,000 winner/loser prize for the Match, paying for the considerable expenses of New York's poshest skyscrapers hotels, and accommodating Kasparov and his entourage every amenity, having already achieved its goal was not enthusiastic.
离开 IBM 后,为了消除人们对“深蓝”优越性的所有疑虑,许曾亲自安排与卡斯帕罗夫的重赛,但被卡斯帕罗夫的经理斥责为“缺乏信誉”并且没有足够的资金来支付奖金(至少一百万美元)。
After leaving IBM, in an effort to dispel all doubts of Deep Blue's superiority, Hsu personally tried to arrange a rematch but was denigrated by Kasparov's manager as “lacking credibility” and having insufficient funds for a prize (at least one million dollars).
深蓝战胜卡斯帕罗夫表明,自上而下的方法可以在受限领域内击败人类,尽管游戏之间需要对硬件和软件进行修复,而卡斯帕罗夫也知道这一点,因此选择了次优的移动策略。
Deep Blue's victory over Kasparov demonstrated that a top-down approach could defeat a human in a restricted domain, albeit with hardware and software fixes in-between games, and that Kasparov knew this and thus chose his sub-optimal move strategy.
这也许并不令人意外,但仍然令人焦虑,如果计算机是最好的棋手,人类比赛是否就会失去吸引力?国际象棋大师是否会被计算机科学家及其机器取代,相互对抗,争夺国际象棋霸主地位?
Unsurprisingly perhaps, but anxiety-inducing nevertheless, if computers are the best players, won’t human matches lose their appeal? Will Grandmasters be replaced by computer scientists and their machines playing against each other for chess supremacy?
至少公众对 2018 年世界象棋锦标赛上新晋世界冠军挪威棋手马格努斯·卡尔森与美国挑战者法比亚娜·卡鲁阿纳之间的比赛仍然有些兴趣。然而,这场比赛以 12 平局结束,最终以点球式快棋对抗(每位棋手总共 30 分钟)决定胜负,卡尔森获胜。
At least there was still some public interest in a match between the new world chess champion Norway's Magnus Carlsen and American challenger Fabiana Caruana in the 2018 World Chess Championship. It however ended in twelve draws, ultimately being decided by a penalty kick-like rapid chess confrontation (total 30 minutes per player) won by Carlsen.
如今,卡尔森和其他特级大师不再与计算机进行冠军赛,而是利用它们来磨练自己的技能。毫无疑问,人类世界象棋冠军的荣耀已被计算机所侵蚀,甚至更糟的是,由于计算机也接受了类似的训练,计算机可能还参与了许多高水平人类比赛中令人乏味的平局。
Carlsen and other GMs nowadays do not play championship matches against computers, but rather use them to hone their skills. The exaltation of a human World Chess Champion no doubt has been eroded by computers, and perhaps even worse, because of similar training on them, the computer may also have had a hand in the many stultifying draws of high-level human matches.
尽管“深蓝与卡斯帕罗夫”对决具有历史意义,但专家系统暴露出需要人工干预才能纠正的缺陷,而自上而下开发广泛专家系统的尝试也无法实现其预期目标。
Despite the historical significance of Deep Blue vs. Kasparov, the expert system revealed foibles that required human intervention to correct, and attempts to develop broad top-down expert systems would not fulfill their prospective destinies.
20 世纪 80 年代初,日本对自上而下的专家系统抱有极高的乐观态度。当时,日本经济崛起,半导体、消费品和汽车等产品威胁着主导世界经济。国际贸易和工业部通产省(MITI)试图为日本在先进技术霸权上的领先地位提供致命一击,宣布了一项为期十年的项目,旨在开发思维机器,用于翻译、交谈和推理各种商业活动。该计划的最终目标是推动日本的工业从制造业转向信息技术源泉的精英地位,通过广泛使用和出口专家系统,使日本成为世界上最重要的知识型经济体。10
The height of optimism for top-down expert systems was on display in Japan in the early 1980s. It was a time of an ascendant Japan threatening to dominate the world economy with its semiconductors, consumer products, and cars. The Ministry of International Trade and Industry (MITI), in an attempt to minister Japan's coup de grâce for advanced technology supremacy, announced a ten-year project to develop thinking machines to translate, converse, and reason for all manner of commercial activities. The culmination of the plan would drive Japan's industry away from manufacturing to an elite position as the fount of information technology, making Japan the foremost knowledge-based economy in the world through the widespread utilization and export of expert systems.10
第五代计算机将具有基于符号推理系统的推理能力,这些符号推理系统与中央知识库机器相连,所有机器都相互连接。其主要想法是,自然资源匮乏的日本可以依靠其强大的人力资源蓬勃发展,并在专家系统的帮助下解决资源短缺、环境破坏、人口老龄化、教育、语言差异等问题,并通过输出永不枯竭且不断扩展的知识,促进全球更高效的生产和通信。11
This Fifth Generation of computing would have powers of reasoning based on symbolic inference systems connected to central knowledge base machines, all connected to each other. The big idea was that natural resources-poor Japan could thrive on exploitation of its formidable human resources, and with the aid of expert systems, solve the problems of resource shortage, environmental damage, ageing populations, education, language differences, and promote more efficient production and communications worldwide through the export of non-depleting and ever-expanding knowledge.11
尽管有可怕的警告称,如果不推行类似的国家计划,美国将不可避免地落后,但经过几次反复,美国的第五代对抗计划始终没有启动。同样,日本的第五代计划在未实现任何目标后逐渐衰落并于 1991 年被放弃,再加上1987 年列表处理器(LISP) 人工智能编程语言市场的崩溃,通产省的宏伟计划不仅标志着夸张的纯自上而下的专家系统的消亡,还加速了第二次人工智能寒冬的深度冻结。
In spite of dire warnings of being irrevocably left behind if it did not pursue a similar national plan, after several fits and starts, America's Fifth Generation counter project never got off the ground. And just as well, for the Japanese Fifth Generation Project petered out and was abandoned in 1991 after achieving none of its goals, and together with the collapse of the List Processor (LISP) artificial intelligence programming language market in 1987, MITI's grandiose plan not only registered the demise of the over-blown pure-play top-down expert systems, but also precipitated the deep freeze of the second AI Winter.
问题可能比计算能力不足、时机过早和实施不完整更为根本,达特茅斯选择《原理》作为原理证明是基于罗素的逻辑主义,实际上基本公理被扔进逻辑机器,经过逻辑处理后,就产生了数学定理!也就是说,数学完全依赖于纯粹逻辑,这是寻找所有数学基本定理的动力,正如罗素和大卫·希尔伯特等伟大的数学家所追求的那样,这被称为形式主义。12
The problem may have been more fundamental than the lack of computing power, premature timing, and incomplete implementation, the Dartmouth choice of the Principia for proof of principle was based on Russell's logicism where in effect basic axioms are thrown into the logic machine and after undergoing logical processing, out come mathematical theorems! That is, mathematics depends entirely on the operations of pure logic, and this was the impetus for the search for all the fundamental theorems of mathematics, as pursued by great mathematicians such as Russell and David Hilbert, that came to be called formalism.12
然而,与柏拉图和伽利略的数学形式天赐“发现”一致,LEJ Brouwer 和 Jules Henri Poincaré 认为基本定理及其相关数学源自直觉:即数学依赖于逻辑运算,而逻辑运算源自直觉,而直觉是上天赋予极少数人的。这种所谓的直觉主义是必要的,因为罗素的逻辑主义不可避免地会导致矛盾,例如“我在撒谎”这个悖论,如果这是真的,就不是真的,因为你在撒谎。
However, in accord with Plato and Galileo's Heaven-sent “discovery” of mathematical forms, L.E.J. Brouwer and Jules Henri Poincaré believed that fundamental theorems and their associated mathematics are derived from intuition: that is, mathematics depends on logical operations which are derived from an intuition that is bestowed by Heaven only upon a very few humans. This so-called intuitionism was required because Russell's logicism inevitably led to contradictions, for example the paradox “I am lying”, which if true is not true because you are lying.
因此,直觉主义者认为,完全通过逻辑运算来开发专家系统的形式主义是不可能的;需要人类的直觉元素,虽然不一定来自数学天才,而是来自许多人的智慧所产生的大数据,这些数据可以通过自下而上的人工智能来整理。
Thus the formalism that an expert system can be developed entirely from logical operations, according to intuitionists, was impossible; a human element of intuition was needed, and although not necessarily from mathematical geniuses, but from the Big Data generated from the intelligence of many humans that can be collated by bottom-up artificial intelligence.
电视数学形式主义者认为,仅靠纯逻辑运算的公理就能产生定理,从而得出数学真理。直觉主义者祈求上天赋予某些人直觉,直觉与逻辑相结合,就能创造出揭示数学运算现实的方程式。自上而下的人工智能依靠人类构建的专家系统来实现指定目标。而自下而上的人工智能则依靠大数据的积累信息,经过算法逻辑的运算,就能找到基本事实。
The mathematical formalists believed that axioms operated upon by pure logic alone would produce theorems that lead to the mathematical truth. The intuitionalists invoked the Heavens to bestow certain humans with an intuition that together with logic would create the equations that revealed the mathematical operations reality. Top-down artificial intelligence relied on human-constructed expert systems operating to achieve a designated objective. Now bottom-up artificial intelligence relies on the accumulated information of Big Data, that upon undergoing algorithmic logic will find the ground truth.
所有这些方法都是“智能”的表现,而一位拥有令人敬畏的人类智能的人将建立通过算法将数据与基本事实联系起来的基本概念。
All of these approaches were manifestations of “intelligence”, and one man who was endowed with a fearsome human intelligence was to establish the fundamental concept that algorithmically linked the data to the ground truth.
数学神童诺伯特·维纳14岁大学数学专业毕业,随后又学习了动物学和哲学,能讲七种语言(据说所有这些语言都很难听懂),在哈佛大学发表了关于集合论的数理逻辑的论文,并在17岁时获得了博士学位。
The mathematics prodigy Norbert Weiner graduated from university in mathematics at age 14, further studied zoology and philosophy, could speak seven languages (and was said to be difficult to understand in all of them), wrote his Harvard dissertation on the mathematical logic of set theory, and received his Ph.D. at the tender age of 17.
他很快就成为数学精英中的一员,前往剑桥向传奇哲学家/数学家伯特兰·罗素和著名纯数学家 G. H. 哈代学习,后来又到欧洲数学和物理堡垒,在哥廷根与伟大的大卫·希尔伯特一起学习。
He was soon a member of the mathematics elite, traveling to Cambridge to learn from the legendary philosopher/mathematician Bertrand Russell and the renowned pure mathematician G.H. Hardy, and thence to the European citadel of mathematics and physics to study with the great David Hilbert at Gottingen.
凭借这份卓越的履历,多才多艺的维纳最初在哈佛大学教授哲学,但后来随着兴趣的不断变化,他先后在通用电气担任工程师以及在《波士顿先驱报》担任记者。
With this transcendent résumé, the eclectic Wiener first taught philosophy at Harvard, but then following continually diverging interests, took jobs as an engineer at General Electric and of all things a reporter for the Boston Herald.
1917 年,美国加入第一次世界大战,维纳渴望参军,但因视力不佳未能入伍,于是应数学家奥斯瓦尔德·凡勃伦的邀请,前往阿伯丁试验场研究炮弹弹道学,这是所有政府在战时都要求其最优秀的数学家做的事情。
With America's entry into World War I in 1917, eager to serve, but failing enlistment because of poor eyesight, Wiener was invited by the mathematician Oswald Veblen to the Aberdeen Proving Ground to work on artillery shell ballistics, something that all governments bade their best mathematicians to do in wartime.
第一次世界大战结束后,维纳未能获得哈佛大学的永久职位,他(和阿尔伯特·爱因斯坦)将此归咎于 GD Birkhoff 教授的反犹太主义观点,于是在麻省理工学院担任数学讲师。1
After the Great War and being rejected for a permanent position at Harvard, for which he (and Albert Einstein) blamed on the anti-Semitic views of Professor G.D. Birkhoff, Wiener took a job as an instructor in mathematics at MIT.1
在剑桥活跃的环境中,维纳定期与来自不同学科的知识分子会面,凭借自身多元化的背景,他开始对不同学科之间的边界研究感兴趣,这并不完全是跨学科研究,而是生物学、电子学、新型计算机以及最终关于身心问题的哲学思考之间的联系。
In the yeasty environment of Cambridge, Wiener joined regular meetings with intellectuals from many different disciplines, and with his own varied background, he became interested in the study of the boundary between different disciplines, not exactly interdisciplinary studies, but the nexus of biology, electronics, the new computers, and finally philosophical ruminations about the mind-body problem.
第二次世界大战期间,维纳重拾弹道学专业,研究如何用反应迟钝的高射炮打击高空快速移动的敌机这一棘手问题。目标总是在移动,对于战斗机来说,移动模式不一定是可预测的,不同的天气条件、不断变化的风向以及高射炮本身的倾向,所有这些都是一个非常复杂的问题,当时的手动机械射击指挥仪几乎无法应对。2
During World War II, Wiener returned to his ballistics specialty to study the vexing problem of hitting high-flying and fast-moving enemy aircraft with slow-to-respond anti-aircraft guns. The targets were always moving, and for fighters, not necessarily in predictable patterns, and different weather conditions, changing winds, and the proclivities of the anti-aircraft guns themselves, all amounted to a very complicated problem with which the manual-mechanical firing directors of the day could hardly cope.2
在这里,维纳自己在动物学方面的研究以及与剑桥大学生物学家的多次讨论,使他想到了所有生物(包括人类)与其环境之间的联系,即自适应反馈回路,通过该回路,信息从人类在追求目标的过程中,会根据环境调整反应。正如查尔斯·达尔文在《物种起源》中所说:
Here Wiener's own studies in zoology and the many discussions with biologists at the universities in Cambridge, led him to the idea of the nexus between all creatures, including humans, to their environment, namely the adaptive feedback loop by which information is relayed from the environment to adjust responses in the pursuit of objectives. As Charles Darwin said in his Origin of the Species:
能够生存下来的并不是最强的物种,
也不是最聪明的,
但却是对变化最敏感的一个。
It is not the strongest of the species that survive,
nor the most intelligent,
but the one most responsive to change.
因此,任何智能机器都必须具备适应能力,因此他创造了控制论学科,即机器的“舵手”,体现了心身一元论。3
It follows that any intelligent machine must be able to adapt, and so he created the discipline of cybernetics, the “steersman” of machines, exemplifying the monism of the mind-body.3
维纳和他的同事们随即设计了一种反馈控制系统,该系统可以根据雷达屏幕上目标飞机光点的移动来引导高射炮;遵循一个不断更新的反馈回路,然后通过启发式方法和一些快速计算,预测目标的轨迹并将其击落。
Wiener and colleagues thereupon devised a feedback control system that guided the anti-aircraft guns in response to the movement of the target aircraft's blip on a radar screen; a continually-updated feedback loop to follow and then through heuristics and some rapid calculations, predict the target's trajectory to shoot it down.
然而,巨大的成功并不是来自于对抗载人飞机,而是来自于第二次不列颠之战中对抗纳粹 V-2 火箭弹。据报道,在一次入侵中,新式火力指挥高射炮击落了 10 枚“飞行炸弹”中的 9 枚。4
The great success, however, came not against human-piloted aircraft but rather against the Nazi V-2 rockets in the second Battle of Britain where reportedly in one incursion nine of ten of the “flying bombs” were shot down by the new fire director antiaircraft guns.4
沃伦·麦卡洛克是耶鲁大学医学院的神经生理学家,专攻癫痫和头部损伤。他的工作使他被任命为伊利诺伊大学精神病学研究实验室的负责人,这似乎与防空炮兵相去甚远。但在参加了一场关于维纳机器反馈理论的讲座后,他受到启发,意识到虽然大脑中的单个神经元在被激活时本身没有任何意义,而是对刺激作出反应,但神经元的集合及其突触连接形成的模式确实有意义,因此机器中的人工神经网络(ANN) 应该能够存储意义,从而获得知识,就像人脑一样。
Warren McCulloch was a neurophysiologist graduate from Yale Medical School with a specialty in epilepsy and head injuries. His work led to an appointment as head of the University of Illinois’ psychiatric research laboratory, seemingly a long reach from anti-aircraft gunnery. But after attending a lecture on Wiener's machine feedback theories, he was inspired to realize that although a single neuron in the brain when activated has no sense in and of itself, rather in response to a stimulus, an aggregate of neurons and their synaptic connections form a pattern that does have a sense, and therefore an artificial neural network (ANN) in a machine should be able to store sense, and thus gain knowledge, just like a human brain.
怀着这个想法,麦卡洛克向 18 岁的数学天才沃尔特·皮茨寻求帮助。皮茨指出,由于神经元的激活状态要么是开启,要么是关闭,因此可以对其进行二进制编码,突触模式可以遵循布尔代数来产生合乎逻辑且理想的结果。经过进一步的抨击,他推测克劳德·香农的双态开/关电子开关可以排列成产生一系列的逻辑模式,形成可以以电子方式模拟人类大脑推理能力的结果。
With this idea in mind, McCulloch sought the help of the 18-year-old mathematics prodigy Walter Pitts, who noted that since the neurons’ activation was either on or off, they could be binary coded, and the synaptic patterns could follow Boolean algebra to produce logical and desirable outcomes. Upon further fulmination, he surmised that Claude Shannon's two-state on/off electronic switches could be arrayed to produce cascades of logical patterns to form outcomes that could electronically model the human brain's ability to reason.
然后,对于人工神经网络来说,存储推理模式并通过反馈回路学习最佳的突触模式以达到该推理的期望结果只是一个合乎逻辑的步骤,从而赋予机器自适应智能,以有利地应对其在环境中遇到的不断变化的条件。
It was then just a logical step for the artificial neural network to store the reasoning patterns, and learn through a feedback loop the best synaptic patterns to reach the desired outcome of that reasoning, thereby endowing the machine with the adaptive intelligence to advantageously respond to the changing conditions it encountered in its environment.
在人工智能领域,这被称为自下而上的方法,即通过人工神经网络对人类大脑进行建模,该神经网络构建突触模式以响应外部刺激,通过数据集、启发式方法和实验进行监督训练,自下而上地学习。
In artificial intelligence circles, this came to be called the bottom-up approach of modeling the human brain by means of an artificial neural network that constructed synaptic patterns in response to external stimuli, learning bottom-up from supervised training from data sets, heuristics, and experimentation.
为了改进和加快学习速度,麦卡洛克的基本人工神经网络需要更复杂的反馈。20 世纪 50 年代末,在维纳跨学科联系的另一个例子中,康奈尔精神病理学家弗兰克·罗森布拉特 (Frank Rosenblatt) 通过在人工神经元激活水平上附加权重来增加人工神经网络响应的细微差别,通过“调节”反馈来更好地向 ANN 呈现观察结果和数据,从而促进更快地学习反馈信息中的重要内容,从而使双态人工神经元更容易受到影响。罗森布拉特将他的加权人工神经元称为感知器。
In order to improve and hasten learning, McCulloch's basic artificial neural network needed more sophisticated feedback. In the late 1950s in another example of Wiener's interdisciplinary nexus, Cornell psychopathologist Frank Rosenblatt made the two-state artificial neuron more impressionable by attaching a weight to the artificial neuron activation level to add nuance to the artificial neural network's response, by “modulating” the feedback to better present observations and data to the ANN, thus promoting more rapid learning of what is significant in the feedback information. Rosenblatt called his weighted artificial neuron a perceptron.
感知器最早用于计算机视觉。一组光电管聚焦于两个正方形的测试图像的特定区域。通过光电效应,图像反射的光子被转换成模拟电信号,这些信号被数字化,以灰度映射到存储在 IBM 704 计算机内存中的像素矩阵上。
The perceptron was first used in computer vision. A bank of photoelectric cells focused on particular regions of a test image of two squares. The reflected light photons from the image were converted into analog electrical signals through the photoelectric effect, and those signals were digitized for greyscale-mapping onto a pixel matrix stored in the memory of an IBM 704 computer.
通过以电子方式加权每个像素的激活水平来反映测试图案的明暗图案的强度,迭代和累积重复将在计算机感知器网络上准确再现目标方块,从而产生人工神经网络模式识别的第一个实例。
By electronically weighting each pixel's activation level to reflect the intensities of the light and dark patterns of the test pattern, iterative and cumulative repetition would produce an accurate rendition of the target squares on the computer's network of perceptrons, thereby producing the first instance of artificial neural network pattern recognition.
重要的是要认识到,IBM 704 不仅仅是像电视摄像机一样将图像重现到视频屏幕上显示,它的人工神经网络还识别了两个正方形的特征并将其存储在内存中,并且原则上一旦这样做了,感知器网络就学会了通过将新查看图像的像素模式与内存中存储的正方形模式进行比较来识别该类型的模式为正方形,这可能对将来识别由正方形构造组成的物体很有用。
It is important to realize that the IBM 704 has not just reproduced the image to display on a video screen like a TV camera, its artificial neural network has recognized the features of two squares for storage in memory, and in principle once having done so, the perceptron network has learned to identify patterns of that type to be squares by comparing the pixel patterns of a new viewed image with the stored squares pattern in memory, useful presumably for future identification of objects comprising square conformations.
然而,麻省理工学院的马文·明斯基 (Marvin Minsky)在其 1969 年出版的《感知器》一书中指出,感知器的组合二进制电路无法执行 XOR 逻辑功能,因此在理论上应用会受到限制。明斯基的论点后来被分层感知器网络推翻,但在此之前,它使人工智能神经网络研究陷入了长达十年的人工智能寒冬,直到 1981 年日本第五代计算机项目失败后才得以解冻,该项目在又一个十年后失败,引发了 20 世纪 90 年代初的另一个人工智能寒冬。5
In his 1969 book Perceptrons, MIT's Marvin Minsky however, argued that the perceptrons’ combinatory binary circuits could not perform the XOR logical function, and would thus be theoretically limited in application. Minsky's thesis was later disproved by a layered network of perceptrons, but not before it chilled AI neural network research into a ten-year AI Winter that thawed only to encounter the ill-fated Japanese Fifth Generation project begun in 1981, which failure after another ten years triggered yet another AI Winter of the early 1990s.5
在明斯基错误的“冬天”理论之后,加州理工学院于 1982 年重新建立了人工神经网络,将记忆建模为突触模式中人工神经元的可变能级,这为感知器在人工智能中的应用提供了强有力的概念验证动力。
After Minsky's false Winter, the California Institute of Technology in 1982 rehabilitated the artificial neural network, modeling memory as variable energy levels of artificial neurons in a synaptic pattern, a strong proof-of-concept impetus for the use of perceptrons in artificial intelligence.
自下而上的反馈人工智能可以用自动驾驶汽车来最好地体现,自动驾驶汽车可以感知环境,提供反馈来控制汽车,并通过环境运动映射来学习如何驾驶,使用执行器和伺服器控制车辆,奖励良好反应并惩罚不良反应,然后存储驾驶数据以供以后进行监督学习。
Feedback bottom-up artificial intelligence can be best exemplified by the autonomous car that senses the environment to provide feedback to control the car and learns how to drive from a mapping of its motion through the environment, controlling the vehicle using actuators and servos, rewarding good responses and penalizing poor responses, and then storing the driving data for later supervised learning.
自动驾驶汽车很容易被其激光雷达塔识别,该雷达塔发射半导体二极管产生的激光束来探测和绘制周围环境,微秒脉冲 I 类低功率和快速旋转的 905 纳米波长激光束用于较短距离(~50 米)侦察,符合美国食品药品管理局的眼睛安全标准和国际电化学委员会的 60825 性能标准。功率更高、波长更长的 1550 纳米激光束更常用于较远距离的探测(~200 米),因为较长的波长更接近可以穿过障碍物的无线电波。6
The autonomous car is easily identified by its LIDAR tower sweeping out semiconductor diode-produced laser beams to probe and map the surroundings, the microsecond-pulsed Class I low-powered and rapidly rotating 905 nanometer wavelength laser beam is used for shorter range (~50 m) reconnaissance, and is in conformance with the US Food & Drug Administration's eye safety standard and the International Electrochemical Commission's 60825 performance standard. The higher-powered, longer-wavelength 1550 nm laser beam is more commonly used for longer range probing (~200 m) because longer wavelengths are closer to radio waves which can pass through obstacles.6
快速旋转的电子设备无法避免绕线维持接触连接的设计问题,因此激光束通过快速旋转的微机电系统 (MEMS) 镜面的反射来旋转。
A rapidly rotating electronic device cannot avoid the design problem of winding wires maintaining contact connections, so the laser beam is rotated by reflection from rapidly rotating micro-electromechanical systems (MEMS) mirrors.
激光束被环境中的物体反射回塔,光电探测器接收返回的光束,并根据光电效应将光强度转换为成比例的电流。根据返回光信号的时间,可以测量被扫描物体的距离,根据运动物体反射时波长的变化,可以利用光的多普勒效应计算被扫描物体的运动。7
The laser beam is reflected by objects in the environment back to the tower where photodetectors pick up the return beams, and in accord with the photoelectric effect, coxnvert the light intensity to proportional electric current. According to the time of the returned light signals, the distance of scanned objects is measured, and in accord with the change in wavelength upon reflection by a moving object, the motion of scanned objects can be computed using the Doppler effect for light.7
在收集所有电磁波数据并执行必要的波计算和映射后,激光雷达系统形成汽车驾驶环境的动态三维点云图像。
After collecting all the electromagnetic wave data and performing the necessary wave calculations and mapping, the LIDAR system forms a dynamic 3-Dimensional point-cloud image of the car's driving environment.
车载计算机运行同步定位和地图构建(SLAM) 软件来动态跟踪汽车,执行器和伺服电机根据点云地图上显示的环境变化和嵌入式交通规则的约束来控制汽车的转向、速度和制动。8
On-board computers run simultaneous localization and mapping (SLAM) software to dynamically follow the car, and actuators and servo-motors control the steering, speed, and braking of the car in response to the changing environment as displayed on the point-cloud map and the constraints of embedded traffic rules.8
结合 GPS 定位和本地导航街道地图,以及应对不断变化的环境的实际驾驶员经验训练集,自动驾驶汽车通过无数次训练,学习如何在各种不同的环境和情况下驾驶。显然,训练数据越多,自动驾驶汽车的反应就越准确,就像人类驾驶员一样,驾驶经验越多,驾驶就越熟练。自动驾驶汽车从头开始学习像人类一样驾驶,通过驾驶更多次并遇到更多不同情况,获得更高的技能。
Together with GPS location and local navigation street maps, training sets of actual driver experience responding to the constantly changing environment, the autonomous car learns how to drive in many different environments and situations through numerous training sessions. Clearly, the more training data, the more accurate the response of the autonomous car will be, and just like a human driver, the more driving experience, the more proficient the driving. The self-driving car learns to drive like a human from the bottom up, and gains greater skill by driving more and encountering more different situations.
未来当足够多的自动驾驶汽车上路时,自动驾驶汽车群将共享并贡献整个群体的驾驶技能,原则上永远不会发生事故。
In the future when sufficient numbers of autonomous cars are on the road, a self-driving swarm of cars will all share in and contribute to the entire group's driving skill set, and in principle could never have an accident.
一群蜜蜂可以准确地追踪它们的目标,在这个过程中,它们不会互相碰撞,也不会妨碍当然,在特定路线上行驶的个别车辆会偏离群体队形,但同样,通过更大的经验数据集,异常行为(例如驶入停车场或突然变道)也可以在合法驾驶活动模式中建模,而鲁莽和醉酒的司机可以立即被(反自下而上!!)群体系统发现。
A swarm of bees can unerringly home in on their target, and in the process never bump into each other, or impede the progress of the whole. Of course, individual cars on their own particular routes will deviate from the swarm formation, but again with larger experience datasets, anomalous behavior such as turning into a parking lot or suddenly changing lanes can be modeled as well in patterns of legal driving activity, while the reckless and drunken driver, can be immediately spotted by the (anti-bottoms-up!!) swarm system.
Waymo 的新型自动驾驶汽车可以通过达尔文进化进一步提高驾驶技能,采用 DeepMind 最初为视频游戏开发的所谓基于人群的训练(PBT)。通过有选择地从驾驶人群中抽取“最适合的样本”,按照视频游戏中的“适者生存”原则重新训练和重新校准自动驾驶,以实现最佳安全性,从而加速驾驶技能的提高。
Waymo's new self-driving cars can further improve their driving skills through Darwinian evolution by employing so-called population-based training (PBT) originally developed by DeepMind for video-game playing. Driving skill is accelerated by selectively drawing from the “fittest specimens” in the driving population in tune with “survival of the fittest” in a video game to retrain and recalibrate autonomous driving for optimum safety.
即使偶尔出现故障,自动驾驶汽车系统至少可以减少美国每天因分心驾驶造成的 9 起死亡和 1000 起受伤。9
Even with an occasional glitch, the autonomous car system will at least reduce the nine deaths and one thousand injuries every day in the United States attributable to distracted drivers.9
到目前为止,自动驾驶汽车将自上而下的感知、反馈、测绘和控制系统与自下而上的监督和强化学习相结合;它还没有采用无监督学习来通过自身的驾驶行为进一步增强自身的驾驶技能。
To date, the autonomous car combines a top-down sensing, feedback, mapping, and control systems with bottom-up training for supervised and reinforcement learning; it has not yet employed unsupervised learning to further augment its own driving skills through its own driving actions.
电视思考机器的功能操作模型可以用一个简单的映射函数来表示,
The functional operations model for a thinking machine can be represented by a simple mapping function,
给定输入变量x(刺激),模型f(x)将它们映射到输出y(响应),如下图所示。
where given the input variables x, the stimulus, the model f(x) maps them onto the output y, the response as schematically shown in the figure below.
在监督学习中,如果输出y与标记的输入数据不准确匹配,则输出与标记的训练数据之间的差异(误差)将被最小化,并将调整后的新数据反馈到算法中,如图所示。如果经过多次迭代误差最小化后,算法仍然没有产生准确的结果,那么算法本身可能需要系统校准。
In supervised learning, if the output y does not accurately match the labeled input data, the difference between the output and the labeled training data (the error) will be minimized and the adjusted new data fed back into the algorithm as shown in the figure. If after many iterated runs of error minimization, the algorithm still does not produce accurate results, then the algorithm itself may require a system calibration.
经过监督训练,模型大概已经学习到了训练集数据的共同特征,并且能够概括这些特征来识别新图像、概率分布,或对呈现给模型的新的非结构化和未标记数据进行分类。
After supervised training, the Model presumably has learned the common characteristics of the training set data, and is able to generalize those characteristics to recognize new images, probability distributions, or perform classification on new unstructured and unlabeled data presented to the Model.
在人工神经网络中,映射函数f(x)是通过在一组标记数据上训练网络而构建的;也就是说,通过抽象标记训练集数据中发现的共同特征进行归纳,形成一个模型概括,通过演绎来识别新出现的数据,从模型到具体情况。
In artificial neural networks, the mapping function f(x) is constructed by training the network on a set of labeled data; that is, by induction from the abstracting of common characteristics found in the labeled training set data to form a model generalization for recognizing newly-presented data by deduction, going from the model to a particular case.
泛化的数学定义是,1
A mathematical definition of generalization is,1
存在一组元素,这些元素具有共同的特征,足以形成一个可以执行演绎推理的概念模型
There exists a set of elements that possesses common characteristics shared by those elements sufficient to form a conceptual model that can perform deductive inferences
换句话说,训练集数据必须具有足够的共同特征来配置模型,以便其推导出的概括可以将新数据归类为属于某个集合,或者预测任何新数据所暗示的后果。
In other words, the training set data must have sufficient common characteristics to configure the model so its deduced generalizations can classify new data as belonging to some set, or predict the consequences that any new data implies.
模型f(x)分类和预测的效果取决于模型对新数据的拟合优度。当模型无法从训练数据集中归纳出任何内容时,它属于欠拟合;或者属于过拟合,这意味着它与训练集过于接近,只能识别与训练集数据几乎完全相同的新数据。就像金发姑娘会选择不太热或不太冷的粥一样,当模型应用于新数据时,它恰到好处地拟合;也就是说,它识别了它应该识别的内容。
How well the model f(x) can classify and predict is determined by the goodness of fit of the model on the new data. The model underfits when it cannot generalize anything from the training dataset, or it can overfit, meaning that it is too closely allied with the training set and can only recognize new data that is almost exactly the same as the training set data. Like Goldilocks who chooses the porridge that is not too hot or cold, when the model is applied to new data, it fits just right; that is, it recognizes what it should recognize.
当模型不能充分捕捉训练数据的抽象共同特征时,我们就称其为欠拟合。从统计分析的角度来看,欠拟合模型的方差低、偏差高。这意味着它们在吸收训练集数据时不会发生太大变化(方差低)并且模型对数据的假设太强(偏向自身)。
A model is said to underfit when it cannot adequately capture the abstract common characteristics of the training data. In statistical analysis terms, underfitting models have low variance and high bias, meaning that they do not change much in absorbing the training set data (low variance) and the model's assumption about the data is too strong (biased towards itself).
例如,如果用直线建模一组分布非常广泛的数据点,那么无论分布如何变化,该模型都会持续存在,并且它做出了过于强烈的假设,即数据可以通过一条简单的直线来概括,而多项式曲线可能更好地拟合数据。
For example, if a very widely distributed set of data points is modeled by a straight line, the model persists no matter how varied the distribution is, and it makes a too strong assumption that the data can be generalized by a simple straight line, whereas for instance a polynomial curve might better fit the data.
欠拟合很容易从模型在训练集数据上的表现不佳中发现;其分类或预测与训练数据上的标签不匹配。欠拟合通常是由于模型训练不足造成的,补救措施是提供更多的训练集数据和更多的数据周期(运行)。然后可能会出现所需的泛化,因为更多的训练使模型更加敏感,从而能够更准确地识别数据所代表的内容。
Underfitting is easy to detect from the model's poor performance on the training set data; its classifications or predictions do not match the labels on the training data. Underfitting is commonly caused by under-training the model, and the remedy is simply to provide more training set data and more epochs (runs) through that data. The desired generalizations may then emerge because more training has rendered the model more sensitive, and thus able to more accurately recognize data for what it represents.
过度拟合在统计上与欠拟合相反,它具有高方差和低偏差,这意味着过度拟合将不相关的数据或噪声纳入模型(高方差),因此在呈现新数据时过于紧密地遵循训练集数据,使得模型的假设很弱(对自身的低偏差),结果过度拟合的模型将无法识别实际上在模型期望泛化范围内的新数据。
Overfitting is the statistically opposite of underfitting in that it has high variance and low bias, meaning that overfitting has included irrelevant data or noise into the model (high variance) and thus too closely follows the training set data when presented with new data, so that the model's assumptions are weak (low bias towards itself), with the result that the over-fitted model will not recognize new data that is actually within the purview of the model's desired generalization.
牛津词典对过度拟合的定义是, 2
A definition of overfitting from the Oxford Dictionary is,2
分析结果与某组特定数据过于接近,因此可能无法拟合新数据或可靠地预测未来的观测结果。
The production of an analysis that corresponds too closely to a particular set of data, and may therefore fail to fit new data or predict future observations reliably.
同样,机器学习模型过度拟合是指模型在不经意间将训练集数据中不相关的细节或噪音纳入其泛化中,好像它们是训练数据的基本抽象共同特征的一部分。过度拟合意味着模型训练过度,就像一个勤奋但愚钝的小学生,他只是死记课本上的数学问题解决方案,而不是从问题中学习抽象的概括,以便将其应用于考试中的新问题。
Again, the machine learning model overfits when the model unwittingly incorporates irrelevant detail or noise in the training set data into its generalization as if those were part of the essential abstracted common characteristics of the training data. Overfitting means that the model has been over-trained, and like the diligent but dull schoolboy who has simply memorized the math problem solutions in the textbook instead of learning the abstract generalizations from the problems such that he can apply them to new problems on an examination.
自下而上的人工智能中过度拟合的一个例子是过于详细的决策树,它包含太多与广义抽象无关的分支和叶子,并包含偶然的噪声,以至于在混乱的无关信息中,很难识别新数据所代表的内容。
An example of overfitting in bottom-up artificial intelligence is an overly detailed decision tree that includes too many branches and leaves irrelevant to generalized abstraction and contains incidental noise, so much so that in the jumble of extraneous information, it is difficult to recognize new data for what it represents.
补救措施当然是谨慎地修剪决策树,例如通过 alpha-beta 丢弃不高效的分支。谨慎的做法是通过重采样来实现的,例如k 倍交叉验证,这只是一种花哨的说法,即取不同的训练数据子集,对子集进行k次训练和测试,并根据子集测试结果观察模型性能,然后对整个模型进行适当的调整。
The remedy of course is to prudently prune the decision tree, for example by alpha-beta discarding unproductive branches. Prudence is exercised by resampling, for instance k-fold cross validation, which is just a fancy way of saying take different subsets of the training data, train and test the subsets k times, and observe the model performance in regard to the subset test results, and then make appropriate adjustments to the overall model.
另一种解决过度拟合的方法是分离出训练数据的一个较大的子集,用作验证数据集,以测试在其余数据上训练的不同模型,这些模型是从每个模型中的误差函数最小化发展而来的。然后通过验证数据集测试每个模型的性能,以查看哪个模型相对于标记数据集的误差最小。验证数据集已成为提高机器学习算法准确性的重要工具。
Another overfitting cure is to separate a larger subset of the training data to use as a validation dataset to test different models trained on the rest of the data and developed from the minimization of the error function in each model. The performance of each model is then tested by the validation dataset to see which model has the smallest error in respect of the labeled dataset. The validation dataset has become an essential tool for promoting machine learning algorithm accuracy.
当模型在训练数据和验证数据集上的表现随着时间的推移而提高时,该模型就是拟合度较高的。但是,需要注意的是,仅仅进行更多的训练实际上会通过过度训练而降低模型的性能,因为过度训练会导致模型与训练集数据过度拟合。此外,进行太多次验证数据集运行可能会导致模型学习验证集数据和训练集数据,就像聪明的女学生从测验中了解到期末考试的内容一样。
The model good-fits when its performance over time on the training data and validation datasets improves. However, it is important to note that simply doing more training can actually decrease the model's performance through overtraining that overfits the training set data. Furthermore, doing too many validation dataset runs may well cause the model to learn the validation as well as the training set data, like the clever schoolgirl who gleans from the quizzes what will be on the final examination.
在这种情况下,可以运行一个独立于训练数据集但遵循与训练数据集相同概率分布的测试数据集。如果结果与运行的训练数据集相似,则表明模型没有过度拟合。这就是聪明的女学生的情况,她学会了如何概括一门课程的所有主题,从而理解了这门课程的精髓。
In this case, a test dataset that is independent of the training dataset but follows the same probability distribution as the training data set can be run. If the results are similar to the training data set run, that is an indication that the model does not overfit. This is the case of the brilliant schoolgirl who has learned how to generalize all the subject matter of a course, thereby understanding the essence of the subject.
通过观察模型在训练过程中准确识别标记训练集数据的改进率,可以找到最佳训练方案,当改进率接近零时,应停止训练以避免过度拟合。这就像一位能干的学校老师,他敏锐地介绍更高级的主题,以避免学生对常规内容感到无聊。
Finding the optimum training regime can be achieved by observing the rate of improvement of the model in accurately recognizing the labeled training set data over the training epochs, and when the rate of improvement approaches zero, the training should be stopped to avoid overfitting. This is like the able schoolteacher who perceptively introduces more advanced topics to avoid boring the students with routine material.
欠拟合和过拟合是困扰人工神经网络的双重困扰;幸运的是,除了验证和测试数据集之外,机器学习建模中还有许多方法可以处理它们。
Underfitting and overfitting are the twin gremlins plaguing artificial neural networks; fortunately in addition to validation and test datasets, there are many ways of dealing with them in machine learning modeling.
可以采用不同的权重和偏差参数初始化,例如贝叶斯、高斯混合模型和因子分析来调整人工神经网络以驱逐小精灵。3
Different weights and biases parameter initializations can be employed, such as Bayesian, Gaussian Mixed Model, and Factor Analysis to tune the artificial neural network to expel the gremlins.3
模型欠拟合和过拟合也可以通过正则化方法(例如超参数L1和L2、dropout和训练数据的人工扩展)来缓解。
Model underfitting and particularly overfitting both also can be alleviated by regularization methods such as the hyperparameters L1 and L2, dropout, and artificial expansion of the training data.
隐藏层中的神经元太少可能会导致从数据中提取的特征图丢失重要特征,从而严重欠拟合输入数据。另一方面,在隐藏层中使用过多的神经元可能会导致激活无关数据和噪声,从而严重过拟合数据。在这种情况下,网络的处理能力太强,以至于训练集中的数据太有限,无法训练隐藏层中的所有神经元,神经元会“找到”并错误地解释无关数据。
Too few neurons in the hidden layers may result in feature maps extracted from the data that miss significant characteristics, seriously underfitting the input data. While on the other hand, using too many neurons in the hidden layers can result in activating irrelevant data and noise, seriously overfitting the data. In this case, the network has so much processing capacity that the data in the training set is too limited to train all the neurons in the hidden layers, and the neurons “find” and falsely interpret extraneous data.
可以通过将层中随机选择的一组激活值全部设置为零来缓解训练集数据因过多神经元而过度拟合的问题。因此,这是对网络清晰度的测试,因为模型应该能够选择正确的泛化,而不管某些激活值是否缺失。4
The overfitting of training set data by excess neurons can be alleviated, simply by dropping out a randomly-chosen set of activations in a layer by setting them all to zero. This is therefore a test of network perspicuity in that the model should be able to choose the right generalizations regardless of the absence of some activations.4
在dropout中,会丢弃不同的神经元组,因此就像在每次 dropout 事件后训练不同的神经网络一样,这些不同的网络会以不同的方式过度拟合数据,因此取它们的平均结果会奇怪地缓解过度拟合。正如现代人工智能的先驱之一、2018 年图灵奖获得者 Yann LeCun 所解释的那样,
In dropout, different sets of neurons are dropped, so it is like training a different neural network after each dropout event, and these different networks will tend to overfit the data in different ways, so taking their average results will curiously alleviate the overfitting. As one of the pioneers of modern AI and a recipient of the 2018 Turing Award Yann LeCun explained,
这种技术减少了神经元的复杂协同适应,因为神经元不能依赖于其他特定神经元的存在。因此,它被迫学习更强大的特征,这些特征与其他神经元的许多不同随机子集结合使用时很有用。
This technique reduces complex co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons. It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other neurons.
亨利·基辛格曾评价德国总理古斯塔夫·施特雷泽曼对第一次世界大战后非常敏感的德国裁军问题的精明处理:“随着时间的推移,他的战术变成了战略,权宜之计变成了信念。” 5
To paraphrase Henry Kissinger on Chancellor Gustav Stresemann's astute handling of the very sensitive issue of German disarmament after World War I, “Over time [his] tactics become strategy and the expedient, conviction”.5
另一方面,人为地扩展训练数据可以使人工神经网络模型更加真实,例如在改进语音识别方面,添加背景噪音以最适合真实的聆听情况。
On the other hand, artificially expanding the training data can make the artificial neural network model more realistic, for example in improved speech recognition, adding background noise to best-fit real-life listening situations.
过度拟合也可以通过正则化技术来改善,例如权重衰减(L1 正则化),它为成本函数添加了一个项,从而稳定了其梯度下降,当人工神经网络的不同运行导致完全不同的结果时,这种方法非常有用。在L2 正则化中,通过添加权重绝对值的总和来修改成本函数,以稳定其梯度下降反向传播。6
Overfitting can also be ameliorated by regularization techniques, such as weight decay (L1 regularization), which adds a term to the cost function, thereby stabilizing its gradient descent, proving useful when different runs of the artificial neural network result in quite different results. In L2 regularization, the cost function is modified by adding the sum of the absolute values of the weights for stabilization of its gradient descent backpropagation.6
米宏观自然现象和工程系统的数学模型几乎总是采用二阶偏微分方程的形式。对于确定性物理系统,这些方程大多是线性的,这意味着导数项的系数是常数或仅是独立变量(而不是因变量)的函数,并且不存在相互相乘或自身平方、立方或更高次方的导数。
Mathematical models of macroscopic natural phenomena and engineering systems almost always take the form of second-order partial differential equations. For deterministic physical systems, those equations are mostly linear, meaning that the coefficients of the derivative terms were constants or functions only of an independent variable (and not a dependent variable), and there were no derivatives multiplying each other or themselves squared, cubed or raised to higher powers.
对于相对简单的系统,线性微分方程通常可以以封闭形式求解,即以多项式、正弦、余弦、指数、自然对数及其组合的基本函数形式求解,这些解将充分描述不同初始和边界条件下的物理情况,从而允许进行数学建模和预测。1
For relatively simple systems, linear differential equations often could be solved in closed form, meaning in terms of the elementary functions of polynomials, sine, cosine, exponentials, natural logs, and combinations thereof, and these solutions would fully describe the physical situation under different initial and boundary conditions, thereby allowing mathematical modeling and prediction.1
然而,现实生活中的大多数物理系统几乎总是非线性的,因此无法以封闭形式求解,因此首先使用早期微分分析仪的齿轮的微小机械转动的有限差分,然后以数字方式增加独立变量并相应地区分因变量,方程式最初是通过年轻女性使用加法机手工求解的,求解速度非常慢;后来,通过数字计算机编程,并采用导数来加速收敛,求解速度非常快。2
However, most physical systems in real life are almost always non-linear, and thereby not solvable in closed form, so using the finite-differences first of tiny mechanical turns of cogwheels of the early differential analyzers, and later digitally incrementing the independent variable and correspondingly differentiating the dependent variable, the equations were solved numerically at first very slowly by hand using young women and adding machines, and later very quickly by programming using digital computers, and taking derivatives to hasten convergence.2
除了困难之外,现实世界的情况非常复杂,涉及微分方程中多个分量和项所表示的许多因素、许多不同的独立变量和因变量以及各种相关参数,每种情况都需要不同的初始条件和边界条件。
Adding to the difficulties, the complexities of real world situations involved many factors represented by multiple components and terms in the differential equations, many different independent and dependent variables, and all manner of associated parameters, each situation requiring different initial and boundary conditions.
此外,牛顿力学在分子、原子和原子核的世界中并不成立,在这些世界中,薛定谔的量子力学二阶微分方程和海森堡的矩阵处理的是微观波函数形成和坍缩的玻恩概率,而不是牛顿宏观计算的决定论。
Newtonian mechanics furthermore did not hold in the world of molecules, atoms, and nuclei, where the quantum mechanical second-order differential equations of Schödinger and the matrices of Heisenberg dealt with the Born probabilities of microscopic wavefunction formation and collapse, rather than the determinism of a Newtonian macroscopic calculation.
然而,对于多体系统,统计平均值可以从宏观特征的角度表示系统的整体行为;例如,由数十亿个微观原子组成的气体的热力学描述的压力、体积和温度的统计力学推导。
For many-body systems, however, a statistical average could represent the overall behavior of the system in terms of macroscopic characteristics; for example, the statistical mechanics-derived pressure, volume, and temperature of a thermodynamical description of gases composed of billions of microscopic atoms.
原子弹之所以会释放能量,是因为在高裂变率的放射性铀235(235 U)或钚(239 Pu 和241 Pu)的原子核分裂过程中,裂变碎片的质量小于原原子核的质量。根据爱因斯坦著名的质能等价方程E = mc 2 ,这种质量亏损相当于原原子核中质子和中子的结合能的释放。3
The Atomic Bomb releases energy because in the splitting of the nucleus of the high fissile rate of radioactive uranium-235 (235U) or plutonium(239Pu and 241Pu)the masses of the fission fragments are less than the mass of the original nucleus. That mass defect, according to Einstein's famous mass-energy equivalence equation E = mc2, is equivalent to a release of the binding energy of the protons and neutrons in the original nuclei.3
原子弹爆炸时,原子核分裂,释放出235 U 的质子和中子,虽然带正电的质子会避免与其他质子碰撞,但中性中子会与其他235 U 或239 Pu 原子核碰撞,引发进一步的核裂变并再次释放出大量的质子和中子。当核裂变有效中子增殖系数k = 1时,235U或239Pu的数量已达到临界质量,从而引发自发的链式裂变反应,在核爆炸中突然释放出所有的质子和中子的结合能。
The Atomic Bomb explodes when a nucleus is split, protons and neutrons of 235U and are released, and although the positively-charged protons tend to avoid collisions with other protons, the neutral neutrons will collide with other 235U or 239Pu nuclei instigating further nuclear fission and again releasing copious protons and neutrons. When the nuclear fission effective neutron multiplication factor k = 1, the amount of 235U or 239Pu has reached a critical mass resulting in a spontaneous chain reaction of fission reactions, suddenly releasing all the proton and neutron binding energies in a nuclear explosion.
在设计原子弹时,科学家首先尝试使用二阶微分和积分方程来确定中子与铀核碰撞以及随后释放更多中子的过程。然而,在扩散级联碰撞中,每次碰撞都会产生超过十亿个中子,每个中子可能会继续与数十亿个其他铀核碰撞,而碰撞会在后续碰撞树的不断扩散的分支中产生数十亿个中子,这显然不适合对每个中子进行确定性运动学(运动研究)计算。
In designing the A-Bomb, scientists first tried to use second-order differential and integral equations to deterministically follow the neutrons in their collisions with the uranium nuclei and the subsequent release of more neutrons. However, in the spreading cascade of collisions where each collision could produce more than a billion more neutrons, and each neutron could go on to collide with billions of other uranium nuclei, and the collisions would produce even more billions of neutrons in ever-spreading branches of a tree of subsequent collisions, it was clearly not amenable to a deterministic kinematic (study of motion) calculation of each and every neutron.
而且,每次碰撞、裂变和释放中子事件都是量子力学的,无法绝对确定,只知道碰撞事件的概率,而且所有过程还受反应室内的初始条件和炸弹外壳的边界条件的影响。
Furthermore, each collision, fission, and release of neutrons event was quantum mechanical and thus could not be absolutely determined, only the probabilities of collision events were known, and all of the processes were further subject to the initial conditions in the reaction chamber and the boundary conditions of the Bomb casing.
参与原子弹研制的人之一是数学家斯坦尼斯拉夫·乌拉姆,他是来自1935年受到德国威胁的波兰的移民,他于1943年加入洛斯阿拉莫斯团队,从事中子通量问题分析的工作,但在运动学研究方面的成功并不比其他人多。
One of those participating in the development of the Atomic Bomb was the mathematician Stanislaw Ulam, an immigrant from a Poland threatened by Germany in 1935, he joined the Los Alamos team in 1943, taking up work on analyzing the neutron flux problem, but with no more success in the kinematics study than others.
当然,曼哈顿计划最终根据中子气体通量的温度、压力、密度要求以及弹腔体积,并通过实验进行了实际设计,并给出了完善的设计。
Of course, the Manhattan Project ultimately produced a sound design based on the temperature, pressure, and density requirements of the neutron gas flux and the volume of the bomb cavity, and practically engineered through experimentation.
然而,铀临界质量中子通量问题一直困扰着乌拉姆,1945 年,当他患上病毒性脑炎时,他被关在医院里,无事可做,只能整天玩纸牌,而且他一直这样做,直到现在。
The uranium critical mass neutron flux problem stuck with Ulam, however, and in 1945 when struck with viral encephalitis, he was confined to hospital with nothing to do but play solitaire all day, which he did, over and over again.
乌拉姆像数学家一样思考,他想知道是否有办法找到成功完成每场游戏的关键,因此他开始记录纸牌的顺序、玩法和游戏的最终布局,并开始相信,如果他能观察足够多的游戏,他可能能够从统计上发现一些导致成功的游戏模式,然后将纸牌布局建模为描述气体整体的热力学压力、温度和体积的统计力学,从而描述单人游戏的整体行为。
Thinking like the mathematician that he was, Ulam wondered if there was any way he could find keys to successfully complete each game, so he began to record the sequence of cards and their play and the game's final layout, and came to believe that if he could just observe a large enough number of games, he might be able to statistically discover some patterns of play that led to success and then model the card layouts analogous to the statistical mechanics of thermodynamic pressure, temperature, and volume describing a gas as a whole, and thus describe the overall behavior of solitaire games.
因为一副 52 张牌的扑克牌的排列方式比我们银河系中的原子数量(约 10 67)还要多,而且每次良好的洗牌都会产生完全不可预测的初始排列顺序,所以从扑克牌中抽出的牌的顺序总是随机的。尽管抽出的牌只有几种放置可能性,但是每种玩法对成功结论的贡献都有不同的战略概率。此外,牌的放置选择取决于牌之前的布局状态,而牌的放置会改变布局状态。所有这些概率和牌的布局对于每场游戏来说都是不同的,因此需要进行天文数字般的游戏才能发现任何宏观的成功关键(如果确实存在的话)。
Because the cards in a 52-card deck can be arranged in more ways than the number of atoms in the our galaxy (about 1067), and a good shuffle results in a completely unpredictable initial arrangement every time, the sequence of cards drawn from the deck will always be random. And although there were only a few placement possibilities for a drawn card, there were different strategic probabilities for each play as to its contribution to a successful conclusion. Furthermore, the card placement choices would depend on the preceding layout state of the cards, and a card placement would change the layout state. All of these probabilities and card layouts would be different for each game, so an astronomical number of games would be required to discover any macroscopic keys to success, if such indeed existed.
乌拉姆意识到所有这些条件都必须包含在任何游戏模型中。他开始思考,他知道马尔可夫链是随机变量序列X i (卡片) 的随机集合,这些变量属于有限配置空间 (卡片布局),这些空间由通过链 (卡片放置可能性) 连接的分离节点组成,代理可以选择在其中行走,但分配了到其他节点 (卡片放置可能性) 的步数概率P ij,其中过程中状态的概率分布仅取决于当前状态 (卡片布局),而不取决于任何过去的状态 (马尔可夫性质)。
Ulam realized all of these conditions would have to be included in any model of the game. He began to think, he knew that a Markov chain is a stochastic ensemble of a sequence of random variables Xi (cards) belonging to a finite configuration space (card layout) of separated nodes connected by chains (card placement possibilities) among which an agent can choose to walk but with assigned probabilities Pij of steps to other nodes (card placement possibilities), whereby the probability distribution of states in the process depends only on the present state (card layout) and not on any past states (the Markovian property).
下图显示了马尔可夫链的示意图,其中概率转移矩阵P ,其中P矩阵的每个元素表示从 A 到 B 和 C、从 B 到 A 和 C、从 C 到 B 和 A 以及在每种情况下都回到自身的概率。
A schematic example of a Markov chain is shown in the figure below with the transition matrix of probabilities P, where each element of the P matrix represents the probability of a step from A to B and C, B to A and C, C to B and A, and steps in every case to itself.
从数学上来说,过渡矩阵P的各个元素由下式给出:
Mathematically, the individual elements of the transition matrix P are given by
如果代理可以移动到链中的任何其他节点的概率为正,无论概率多低,则该概率是不可约的;如果代理不能在相同节点的循环中无休止地循环,则该链是非周期性的;如果代理可以探索每个节点,则该链是遍历性的。4
If there is a positive probability, no matter how low, that an agent can move onto any other nodes in the chain, it is irreducible; if the agent cannot go around endlessly in cycles of the same nodes, the chain is aperiodic; and if the agent can explore every node, the chain is erdogic.4
马尔可夫链是单人纸牌游戏的完美模型,因为它完全满足了游戏的要求。洗牌后的一副牌是随机分布的,但在玩单人纸牌游戏时,手中牌的可能摆放范围是有限的,这有利于达到最终目标。乌拉姆发现,在可能的输入(打牌)范围内,每种玩法都有确定的概率,与其达到所有牌都排列在最终阵列中的目标的有效性相称,经过确定性计算后,根据单人纸牌游戏规则,给定特定输入会产生相同的输出,他可以使用马尔可夫链来模拟游戏。
The Markov chain was a perfect model for solitaire as it completely satisfied the game's requirements. A deck of cards after shuffling is randomly distributed, but in playing solitaire, there are restricted domains of possible placements of the card in hand that will go towards reaching the final objective. Ulam found that within the domain of possible inputs (the playing of a card), there would be a definite probability for each play commensurate to its efficacy in reaching the objective of all the cards being in the final array, and after a deterministic computation whereby the same output is produced given a particular input according to the rules of solitaire, he could model the game using a Markov chain.
至于不可预测性,洗好的牌堆中牌的随机分布可以通过对代表牌的数字进行随机分布采样来获得,然后通过反复采样和汇总结果,如果他可以模拟足够数量的游戏,他就能发现成功的关键,例如存在的。
As for unpredictability, the random distribution of cards in a shuffled deck could be taken from sampling a random distribution of numbers representing the cards, and then by repeated sampling and aggregating the results, if he could simulate a sufficient number of games, he could discover the keys to success, such that existed.
采用随机分布的模拟被称为“蒙特卡罗”,以拉斯维加斯一家赌场的名字命名,而拉斯维加斯本身又以欧洲赌博之都的名字命名,这体现了赌博业宣传的完全随机的获胜机会,其中不包括赌场的保证利润,因为这是建立在大数定律的基础上的。它也被称为伯努利定律,它是任何活动在经过大量试验后最终回归均值的结果(赌博中的均值总是在庄家这边),也是蒙特卡罗模拟复杂系统有效性的基础5。
A simulation employing a random distribution is called “Monte Carlo” after the name of a casino in Las Vegas itself named after the gambling capital of Europe, evincing the gambling industries’ propaganda of a completely random chance at winning, something that does not include the assured profits of the casino, based as that is on the Law of Large Numbers. Also called Bernoulli's law, it is the ultimate regression to the mean of any activity after a very large number of trials (the mean in gambling is always with the house), and is the foundation of the efficacy of a Monte Carlo simulation of a complex system5.
当统计样本N的大小趋近于无穷大时,方差 σ 2(即标准差的平方平均值σ是衡量数据与算术平均值μ之间变异性的指标)将趋近于零,回归到均值概率将趋近于任何活动的真实概率。标准差σ衡量数据的离散程度,它只是方差的均方根,
As the size of a statistical sample N approaches infinity, the variance σ2, which as the average squared standard deviations σ is a measure of the variability of data from the arithmetic mean μ, will approach zero, and the regression to the mean probability will approach the true probability of any activity. The standard deviation σ measures the dispersion of the data, and is just the root mean square of the variance,
方差作为差值的平方,始终为正,而且平方将突出显示μ的异常值,并回归至平均值(如果使用绝对值而不是平方,它将回归至中位数而不是平均值)。
The variance as the square of the difference will always be positive, and squaring furthermore will highlight the outliers from μ, and regress to the mean (if an absolute value is used instead of squaring, it will regress to the median rather than to the mean).
随着试验次数趋近于无穷大,方差将趋近于零,因此在公平抛硬币的例子中,正面或反面的概率最终将趋近于平均概率 0.5,而随着试验次数趋近于无穷大,方差将趋近于零,如下图所示。
The variance will approach zero as the number of trials approaches infinity, so in the example of a fair coin toss, the probability of heads or tails will eventually proceed to the mean probability of 0.5, with a variance of zero as the number of trials approaches infinity as shown schematically in the figure below.
原子弹爆炸可以用马尔可夫链蒙特卡罗模拟 (MCMC) 进行类似模拟,因为裂变反应和中子碰撞中存在一系列可能的相互作用,这些相互作用取决于粒子弹性散射(改变方向但不改变能量)、与原子核的非弹性碰撞(改变粒子的引力场(包括电子、方向和能量)被原子核吸收,以及可能发生的原子核裂变,类似于单人纸牌游戏中牌面位置的选择。
The detonation of an atomic bomb could be similarly simulated using Markov chain Monte Carlo simulation (MCMC) because there is a domain of possible interactions in fission reactions and neutron collisions that depend on the probabilities of the particles’ elastic scattering (change direction but not energy), inelastic collisions with nuclei (change of direction and energy), absorption by the nuclei, and possible fission of the nuclei, analogous to the choice of card placement in solitaire.
举一个简单的例子,中子源穿过一个含有铀-235 的腔体,它们可能会被铀核散射或吸收,后者可能导致裂变,具体取决于中子的能量和累积的中子通量密度。因此,问题就是中子的扩散和中子的裂变倍增,其概率取决于先前得出的实验结果。中子在没有被散射或吸收的情况下行进的平均距离(平均自由程)以及中子与铀核碰撞的概率(碰撞截面)也取决于中子的动能。
As a simplified example, a source of neutrons is passing through a cavity containing uranium-235 and they may either be scattered or absorbed by the uranium nuclei with the latter possibly resulting in fission depending on the energy of the neutrons and the accumulating neutron flux density. The problem thus is one of neutron diffusion and fission multiplication of neutrons, the probabilities of which depend on previously derived experimental results. The mean distance that the neutrons will travel without being scattered or absorbed (mean free path), and the probability of a neutron collision with a uranium nucleus (collision cross-section), is also dependent on the kinetic energy of the neutrons.
众所周知(并且完全可以相信),中子在散射或吸收(自由程)之前行进距离x的概率密度P会根据原子核密度ρ及其碰撞截面ζ呈指数下降,因此,无穷小行进距离dx的无穷小概率由下式给出:
It is known (and completely believable) that the probability density P of a neutron traveling a distance x before being scattered or absorbed (free path) decreases exponentially depending on the density of nuclei ρ and their collision cross-section ζ, the infinitesimal probability for an infinitesimal travel distance dx thus is given by,
对该方程进行积分将得出中子通量密度(每单位面积通过的中子数量),
Integrating this equation will give the density of the neutron flux (number of neutrons passing per unit square area) as,
现在,第i次试验的自由路径长度x i位于区间 (0, ∞) 上,可以通过进行以下变换,用计算机生成的均匀分布在区间 (0, 1) 上的伪随机数序列 ξ i来表示,
Now xi the free path length for trial i, which is over the interval (0, ∞), can be represented by a computer-generated sequence of pseudorandom numbers ξi uniformly distributed in the interval (0, 1) by making the transformation,
也就是说,通过这种变量变换,中子自由程可以用伪随机数来表示。例如,如果中子与铀同位素核碰撞的实验数据显示散射概率为 0.9,吸收概率仅为 0.1导致裂变,如果将ξ i区间 (0, 1) 分成两组 (0, 0.1) 和 (0.1, 1),假设计算机生成的伪随机数为 0.2,则它属于第二个较大的组 (0.1, 1),这意味着中子已经散射。对越来越多的伪随机数组重复此操作将为上述中子通量方程提供越来越好的近似值。
That is, the neutron free path can be expressed by pseudorandom numbers through this variable transformation. For example, if experimental data for neutron collisions with uranium isotope nuclei shows a 0.9 probability of scattering and only 0.1 probability of absorption leading to fission, and if the ξi interval (0, 1) is segmented into two groups (0, 0.1) and (0.1, 1), if say the pseudorandom number generated by the computer for example is 0.2, then it belongs in the second larger group (0.1, 1) meaning that the neutron has been scattered. Repeating for more and more sets of pseudorandom numbers will give better and better approximations for the Neutron Flux equation above.
如果中子穿过,则赋予其“分数” s = 1,如果被吸收,则赋予其“分数” s = 0,因此对中子通量做出贡献的概率由平均分数给出其中误差用方差来衡量。
If a neutron passes through, it is given a “score” s = 1 and if absorbed s = 0, so the probability of contributing to the neutron flux is given by the mean score where the error is measured by the variance.
对散射角进行相同类型的转换和评分,也可以模拟散射中子的方向,但还必须考虑中子散射回中子通量的可能性,这使问题变得复杂,必须在更详细的计算中加以考虑。
Performing the same type of transformation and scoring for scattering angle, the directions of the scattered neutrons can also be simulated, although the possibility of neutrons scattered back into the neutron flux must also be considered, which complicates matters and must be considered in more detailed calculations.
对于考虑质子、不同腔体设计和初始条件等的更复杂情况,可以通过先前实验得出的不同类型的反应物和腔体配置的横截面和平均自由程参数来建模,并输入到 MCMC 模拟中。6
For more complex situations considering protons, different cavity designs and initial conditions and so on included, the situation can be modeled through previously experimentally-derived cross-section and mean free path parameters for different types of reactants and cavity configurations, and entered into the MCMC simulation.6
从这个例子中,可以看出中子的传播不仅仅是随机游走;也就是说,到达后续节点的步骤有不同的概率,因此只要给出相关事件的概率,就可以模拟任何由许多相互作用实体组成的系统。
From this example, it can be seen that the neutron's travels are not merely a random walk; that is, there are different probabilities for the steps taken to succeeding nodes, and so just given the relevant event probabilities, any system of many interacting entities can be simulated.
现在,从正态概率分布中抽样并运行多次 MCMC 模拟后,原子弹中子扩散通量(10 15 –10 25 )中的大量粒子可以通过仅抽样 10 5 –10 8轨迹来建模,临界质量的条件由函数f(X)表示,其中随机样本序列X i近似于所需的概率函数P (X),其中f(X)与P(X)成比例。7
Now after sampling from a normal probability distribution and after many runs of the MCMC simulation, the vast number of particles in atomic bomb neutron diffusion flux (1015–1025) can be modeled by sampling of only 105–108 trajectories, and the conditions for critical mass represented by a function f(X) with sequences of random samples Xi to approximate a desired probability function P(X), where f(X) is proportional to P(X).7
临界质量所需的概率P(X)可以被认为是f(X)通过马尔可夫链逐步达到的概率密度,该步骤与更大的概率相称(通常朝向随机分布的峰值),迭代地将f(X)推得更接近P(X),并且与伯努利定律一致,MCMC 模拟总量将回归到达到铀核临界质量的平均概率,以产生足以形成自持链式反应和随后爆炸的中子扩散通量。
The desired probability P(X) for critical mass can be thought of as a probability density that is reached by f(X) stepping through the Markov chain in steps commensurate with the greater probability (generally towards the peak of a random distribution), iteratively pushing f(X) closer to P(X), and consistent with Bernoulli's law, the MCMC simulation in aggregate will regress to the mean probability of attaining a critical mass of uranium nuclei to produce a neutron diffusion flux sufficient to form a self-sustaining chain reaction and subsequent detonation.
中子穿过弹腔时,通过对参数的实验概率分布进行抽样,可以构建出大量的中子轨迹。因此,链式反应取决于整个轨迹集的总体结果。
A large number of neutron trajectories are constructed by sampling from the experimental probability distributions of parameters as the neutrons travel through the bomb cavity. The chain reaction therefore depends on the aggregate outcomes of the total set of trajectories.
随机分布保证了链式反应的实际概率会被105—108条轨迹“覆盖” ,而且随着计算机计算能力的不断增强,包括用于核武器研究的超级计算机,轨迹的数量还可以不断增加,以更好地满足大数定律。
The random distribution ensures that the actual probability of a chain reaction will be “covered” by the 105–108 trajectories, and with the ever-increasing computational capabilities of computers, including the supercomputers used for nuclear weapons research, the number of trajectories can be increased to better satisfy the law of large numbers.
对原子弹马尔可夫链进行蒙特卡罗模拟多次将使结果的方差趋于零,从而揭示出产生原子弹爆炸所需的链式反应的平均概率。
Running the Monte Carlo simulation over the atomic bomb Markov chain many, many times will drive the variance of outcomes to zero and thereby reveal the mean probability of attainment of the chain reaction required for producing the detonation of the Bomb.
如果概率较低,则可以调整热力学和弹腔工程参数,并反复运行模拟,直到达到临界质量,并且知道临界、超临界和亚临界情况的工艺条件。
If the probability is low, then the thermodynamic and bomb cavity engineering parameters can be adjusted and the simulation run again and again until critical mass is attained and the process conditions for Critical, Super- and Sub-Critical situations are known.
了解爆炸条件至关重要,这样炸弹才能在需要的时候爆炸,而不是在实验室中爆炸,因此必须根据中子扩散的 MCMC 模拟计算出质量/能量与时间的简化图,如下图所示。
It is paramount to know the conditions for detonation so that the bomb will detonate when desired and not in the laboratory, so a simplified plot of Mass/Energy vs. Time must be calculated from the MCMC simulation of neutron diffusion, as shown in the figure below.
MCMC 的研发对于原子弹来说太晚了,但从 1951 年开始的氢弹研发需要裂变反应来产生辐射,以触发氢同位素氘和氚的聚变。
The MCMC was developed too late for the atomic bomb, but beginning in 1951 the development of the hydrogen bomb required fission reactions to generate the radiation to trigger the fusion of the hydrogen isotopes deuterium and tritium.
乌拉姆首先在约翰·冯·诺依曼的 MANIAC 计算机(源自普林斯顿大学的 IAS 计算机)上运行了针对多相互作用粒子系统的 Metropolis-Hastings MCMC 算法,后来又在宾夕法尼亚大学的 ENIAC 计算机上成功完成了模拟。
Ulam first ran the Metropolis-Hastings MCMC algorithm for a system of many interacting particles on John von Neuman's MANIAC computer (derived from the IAS computer at Princeton) and later successfully completed the simulation on the ENIAC computer at the University of Pennsylvania.
氢弹引爆的理念是基于辐射内爆,而不是仅仅利用中子通量来产生聚变,关于这是爱德华·泰勒还是斯坦尼斯拉夫·乌拉姆的提议存在争议,但泰勒对乌拉姆的马尔可夫链蒙特卡罗模拟的洞察是正确的,8
The H Bomb detonated based on the idea of radiation implosion instead of only neutron flux to produce fusion, with controversy over whether it was Edward Teller or Stanislaw Ulam's proposal, but Teller's insight regarding Ulam's Markov chain Monte Carlo simulations was spot-on,8
利用统计力学并取整体平均值,而不是遵循详细的运动学
Take advantage of the statistical mechanics and take ensemble averages instead of following detailed kinematics
总之,马尔可夫链是一组有限的连接节点,每个节点都有一个指定的选择概率,因此节点的概率分布仅取决于节点的当前状态,而不取决于过去的状态。蒙特卡罗模拟依赖于变量的转换,允许计算机生成的伪随机数表示复杂过程中的代理,大数定律确保重复试验将使系统趋于平均值,从而揭示过程的基本事实;在这个例子中,不是公平抛硬币的良性结果概率,而是原子弹或氢弹的可怕爆炸。
In summary, the Markov chain is a finite ensemble of connected nodes each having an assigned probability of choice whereby the probability distribution of nodes depends only on the present state of nodes and no past states. The Monte Carlo simulation depends on a transformation of variables allowing pseudorandom numbers generated by a computer to represent agents in a complex process, with the law of large numbers ensuring that repeated trials will drive the system to the mean, thereby revealing the ground truth of the process; in the example, instead of the benign outcome probability of a fair coin toss, the horrific detonation of an atomic or hydrogen bomb.
总统选举预测可以类比为炸弹。从普通人群中抽取的调查样本是根据党派归属、性别、经济阶层、种族等人口统计数据构建的,实际上提供了事件概率,就像中子和铀同位素核的平均自由程和碰撞截面具有散射或吸收以及随后裂变的概率一样。
An analogy with the bombs can be made for a presidential election prediction. Survey samples from the general population are constructed from demographics such as party affiliation, gender, economic class, ethnicity, and so on, providing in effect event probabilities just as the mean free path and collision cross-sections for neutrons and uranium isotope nuclei have probabilities for scattering or absorption and subsequent fission.
除了核武器和民主选举固有的风险之外,蒙特卡罗模拟还用于天气预报、喷气式飞机周围的湍流、宇宙膨胀、生物系统、生态学、股票市场和销售预测经济学以及任何运筹学问题(如交通管制和机场客流量)等更有益的领域。蒙特卡罗模拟已在几乎所有自然和社会科学和工程领域得到有效应用,尤其是在模拟人工智能大数据的结果方面。
Aside from nuclear weapons and the risks inherent in democratic elections, Monte Carlo simulation has been used in the more beneficent areas of weather prediction, turbulent airflow around jet planes, expansion of the Universe, biological systems, ecology, stock market and sales predictions economics, and in any operations research problem such as traffic control and airport passenger flow. Monte Carlo simulation has been gainfully employed in almost all science and engineering, both natural and social, and particularly in simulations of the results of the Big Data of artificial intelligence.
我在人类中,视觉是由眼睛晶状体将光线聚焦到眼睛后部内侧的视网膜上产生的,视网膜将光线转换成电信号,眼后视神经接收电信号,通过外侧膝状体核(LGN) 中继通路将信号传输到大脑后部称为视觉皮层的神经元聚集区,如下图所示。视觉皮层接收、处理和整合视觉信号,形成一种模式,供大脑通过神经元网络进行认知处理,下图以示意图形式显示了其中一个神经元。1
In humans, vision is generated from light focused by the eye's lens onto a retina at the inside posterior of the eye, the retina converts the light into electrical signals which are received by an optic nerve behind the eye that transmits the signals through the lateral geniculate nucleus (LGN) relay pathway to an aggregation of neurons at the posterior region of the brain called the visual cortex, as shown schematically in the figure below. The visual cortex receives, processes, and integrates the vision signals to form a pattern for the brain to cognitively process by means of a network of neurons, one neuron of which is schematically illustrated in the figure below.1
如果来自其他皮质神经元的信号总和大于神经元树突上的信号,则皮质神经元的胞体被激活某个阈值。神经元的激活通过轴突传递到其他神经元,轴突末端与其他神经元的树突相连,根据神经科学的格言,
The soma of a cortical neuron is activated if the sum of the signals from other cortical neurons at the neuron's dendrites is greater than some threshold value. The neuron's activation is transmitted to other neurons through axons whose axon terminals are connected to the dendrites of the other neurons, and in accord with the neuroscience maxim,
一起放电、一起连接的神经元,
Neurons that Fire together, Wire together,
激活的神经元的突触放电模式在视觉皮层内形成,然后由大脑解析以形成认知图像。
synaptic firing patterns of activated neurons are formed within the visual cortex which are then resolved by the brain to form images for cognition.
在计算机视觉中,相机镜头接收物体反射的光像素,传感器阵列通过光电效应将光转换为电信号。来自人工视网膜的信号被放大,并类似于视神经和 LGN 进行中继,然后传输到仿照视觉皮层建模的人工神经网络 (ANN)。
In computer vision, a camera's lens receives pixels of light reflected from an object and an array of sensors converts the light into electric signals by means of the photoelectric effect. The signals from this artificial retina are amplified and relayed in analogy with the optic nerve and LGN, and transmitted to an artificial neural network (ANN) that is modeled after the visual cortex.
ANN 是由人工神经元阵列构成的网络。四层深度人工神经网络(各层为垂直列)的示例包括:一个输入层,用于接收网络刺激;后面是两个隐藏层;以及一个输出层;每列层有四行神经元,排列成4 × 4阵列,给定列层中的每个神经元与后续列层中的所有神经元之间都有节点连接,如下图所示。
The ANN is a network of layers of arrays of artificial neurons. An example of a four-layer deep artificial neural network (with layers being the vertical columns) has an input layer to receive stimuli to the network, followed by two succeeding hidden layers, and an output layer, each column layer having four row neurons organized in a 4 × 4 array with node connections from each neuron in a given column layer to all the neurons in the succeeding column layer, as shown in the schematic the figure below.
单个人工神经元的激活由网络中如下所示的元素表示,
The activation of an individual artificial neuron is represented by the elements in the network as designated below,
其中, ( L )为层数,下标j表示第L层中的行神经元, ANN结构图中的yj为决策输出向量的元素。
where (L) is the layer number, the subscript j denotes the row neurons in the layer L, and the yj in the ANN structure figure are the elements of the decisional output vector.
人工神经网络的各层以体积矩阵(二阶张量)的形式保存在计算机内存中,其中人工神经元激活级别作为元素,通常具有两个维度用于空间分布,一个维度用于颜色,以及用于决策输出的向量(一阶张量)。这些元素可以像在生物神经网络中一样以二进制方式激活为“开”或“关”,也可以像在数码相机中一样以层次方式激活,其级别反映所查看图像的光强度。
The artificial neural network's layers are held in computer memory as volume matrices (second-rank tensors) with the artificial neuron activation level as elements, typically with two dimensions for spatial distribution and one for color, and vectors (first-rank tensors) for decisional output. The elements can be binary-activated as either “on” or “off” as in the biological neural network or gradation-activated with levels reflecting the intensity of the light from the viewed image as in digital cameras.
人工智能算法是通过在ANN的矩阵层上进行基本的数学矩阵运算来执行的,这些运算包括加、减、乘、卷积、内外向量积等,体现了人工神经网络的“思考”过程,而参数化、梯度下降反向传播则实现了ANN的“学习”,其学习和识别的能力将成为其“人工智能”程度的衡量标准。
Artificial intelligence algorithms are executed by basic mathematical matrix operations on the ANN's matrix layers, including addition, subtraction, multiplication, convolution, inner and outer vector products, thus incarnating an artificial neural network's “thinking” process, while parameterization and gradient descent backpropagation effectuates the ANN's “learning”, and how well it learns and recognizes will be a measure of its “artificial intelligence”.
隐藏层的数量、大小和类型由识别任务决定。一般而言,具有两个或更多个隐藏层的人工神经网络被视为深度神经网络(DNN)。如果后续层中的所有人工神经元都与前一层中的每个神经元相连(如图所示),则称这些层是完全连接的;如果前一层中只有部分神经元与后续层相连,则它们在卷积层中形成子矩阵窗口(称为过滤器或内核),从而选择前一层的扇区以在卷积神经网络(CNN) 计算机视觉中进行特定的、更精细或更粗糙的和位置特征提取。2
The number, size, and type of hidden layers are determined by the recognition task. Generally speaking, an artificial neural network with two or more hidden layers is considered to be a deep neural network (DNN). If all the artificial neurons in a succeeding layer are connected to each of the neurons in a preceding layer (as shown in the figure), the layers are said to be fully connected; if only some of the neurons in the preceding layer are connected to a succeeding layer, they form sub-matrix windows (called filters or kernels) in a convolutional layer whereby sectors of a preceding layer are selected for specific, finer or coarser, and positional feature extraction in convolutional neural network (CNN) computer vision.2
在前馈模式下,激活的神经元将网络中后续人工神经元中的人工神经元连接起来,形成人工神经元激活的突触模式,就像在生物神经网络中一样。然而,在人工神经网络中,每个人工神经元的激活水平对形成模式的重要性都是通过加权和偏置激活来人为调节的,这个过程称为参数化,有助于区分模式的特征。3
In feedforward mode, the activated neurons connect the artificial neurons in succeeding artificial neurons in the network to form synaptic patterns of artificial neuron activation, just as in a biological neural network. However, in an artificial neural network the importance of each artificial neuron's activation level towards forming a pattern is artificially modulated by weighting and biasing the activations in a procedure called parameterization that helps to distinguish the features of the pattern.3
权重和偏差最初是任意分配的,为人工神经网络形成了一个空白画布,通过增强重要特征和减少不重要特征(如噪声或背景)来感知模式。在监督学习中,ANN 会获得一个标记的训练数据集,它会尝试通过调整权重和偏差的初始随机值来匹配数据集,以通过梯度下降并通过其网络层反向传播该误差来最小化其激活模式和标记训练数据集之间的差异(误差) 。当误差接近于零(收敛)时,训练集数据的显著特征已被学习,可用于未来的识别任务。
The weights and biases are initially arbitrarily assigned, forming a blank canvas for the artificial neural network to perceive patterns by enhancement of significant features and diminution of insignificant ones (such as noise or background). In supervised learning, the ANN is presented with a labeled training dataset, it attempts to match the dataset by adjusting the initially random values of the weights and biases to minimize the difference (error) between its activation pattern and the labeled training dataset by gradient descent and backpropagating that error through its network layers. When the error approaches zero (converges), the distinguishing features of the training set data has been learned, to be used for future recognition tasks.
一般来说,第一个隐藏层从神经元激活水平的明显变化中检测“高级”特征,如边缘,例如猫耳朵的三角形边缘,第二个隐藏层检测“低级特征”,如毛皮的颜色,后续层在分层特征提取过程中挑选或细化毛皮中的图案等特征。通过参数调整来描绘特征,以最小化与训练集的误差,组合结果将提供猫的特征,以供存储以供以后识别所看到的物体。
Generally, the first hidden layer detects “high-level” features such as edges from clear shifts in neuron activation level, for example the triangular edges of a cat's ears, the second hidden layer detects “low-level features” such as the color of fur, and succeeding layers pick out or refine features like patterns in the fur in a process of hierarchical feature extraction. After delineating the features by parameter adjustment commensurate with minimizing the error with the training set, the combined result will provide characteristic features of a cat to be stored for later recognition of viewed objects.
添加更多隐藏层并采用卷积层进行更精细的特征提取和相对定位,应该会产生更精细的复合特征阵列,但额外的层当然会带来更大的计算负担。
Adding more hidden layers and employing convolutional layers for finer feature extraction and relative position should result in a more refined composite array of features, but additional layers of course carry a greater computational burden.
有许多超参数可以添加和调整,以微调人工神经网络的识别能力,并且可以对反向传播梯度下降进行超参数化,以获得更高的稳定性和收敛速度。4
There are many hyperparameters that can be added and adjusted to fine-tune the artificial neural network's recognition capability, and the backpropagation gradient descent can be hyperparameterized for greater stability and speed of convergence.4
经过监督训练后,如果 ANN 想要下棋、围棋或玩视频游戏,它可以进一步进行强化学习(RL),其中根据 ANN 在游戏中的动作是否有助于达到与成功相称的目标来给予奖励和惩罚。
After the supervised training, if the ANN wants to play chess, Go, or video games, it can further undergo reinforcement learning (RL) wherein rewards and punishments are given based on the ANN's moves in the games being helpful or unhelpful in reaching a goal commensurate with success.
深度信念神经网络( DBN) 为 ANN 提供概率分布,而不是对象或数据,根据 ANN 可能试图确定的概率为ANN 提供先验信念。例如,在自动语音识别中,隐马尔可夫模型采用马尔可夫链提供统计概率来推断字母、单词或语音序列,这些字母、单词或语音序列的出现概率高于其他字母、单词或语音序列,例如在辅音之后出现元音和修辞手法的情况下。
A deep belief neural network (DBN) fuels the ANN probability distributions rather than objects or data, providing the ANN with a prior belief based on those probabilities about which it may be trying to ascertain. For example, in automatic speech recognition, a Hidden Markov Model employs a Markov chain to provide statistical probabilities to infer letters, words, or speech sequences whose occurrences are more probable than others, in cases such as vowels after consonants and figures of speech.
由于语音识别不仅在很大程度上依赖于声学模式,还依赖于话语的时间,因此循环神经网络(RNN) 会延迟、激活或停用选定的人工神经元,以对语音进行时间表征并根据先前的语音模式识别推断(循环)。
Since speech recognition is strongly dependent not only on acoustic patterns, but also on the timing of utterances, a recurrent neural network (RNN) delays, activates, or deactivates selected artificial neurons for temporal characterization of the speech and for identifying inferences based on prior patterns of speech (recurrence).
在无监督学习中,ANN 从头开始学习,通过针对自身的不同改进版本进行训练,自下而上地自我强化,最终发展成为纯粹的推理引擎,例如 AlphaGoZero,它在国际象棋、围棋和视频游戏中击败了人类冠军,而无需事先了解游戏规则,并且能够直接从实际语音中学习模式和语音识别,而无需监督训练。
In unsupervised learning, the ANN learns from scratch by training against different improving versions of itself in a process of self-strengthening from the bottom up to ultimately develop into a pure-play inference engine such as AlphaGoZero which dominated human champions in chess, Go, and video games without a priori knowing the rules of the games, and is capable of learning pattern and speech recognition directly from actual speech without supervised training.
所有这些不同类型的人工神经网络、参数化、梯度下降、反向传播和超参数化将在后面的章节中更详细地描述。
All of these different types of artificial neural networks, the parameterization, gradient descent, backpropagation, and hyperparameterization will be described in more detail in the following chapters.
一个孩子第一次看到猫,父母告诉她猫叫什么名字时,她会想象出猫的形象,其主要特征是毛皮、四条腿、长尾巴、尖耳朵、大圆眼睛、胡须和小粉红三角形鼻子,所有这些都以吸引人的方式排列。孩子的视觉皮层形成复合图像,将其标记为“猫”,并将其存储在记忆中。
A child upon seeing a cat for the first time, and being told by her parents what it is called, creates an image with the prime features of fur, four legs, a long tail, pointy ears, big round eyes, whiskers, and a small pink triangular nose, all arranged in an appealing configuration. The child's visual cortex forms the compound image, labeling it as a “cat” and stores it in memory.
在计算机视觉中,物体反射的光被相机镜头拾取,并通过类似视神经的中继器传输到类似视网膜的光电探测器,后者将光转换为成比例的电信号。然后,这些信号通过定期采样信号并为每个样本分配一个代表亮度级别的数字(范围从 0 到 1)进行模拟到数字转换,称为灰度,用于将数字输入到像素阵列中。
In computer vision, light reflected from an object is picked up by a camera's lens and transmitted through an optic nerve-like relay to retina-like photodetectors that convert light to proportional electrical signals. Those signals are then analog-to-digital converted by sampling the signals at regular intervals and assigning each sample a number representing the level of brightness in a range from 0 to 1, called greyscale for digital entry into an array of pixels.
像素阵列在计算机中表示为具有不同光强度的像素的二维矩阵,其中第三维包含原色,通常是红、绿、蓝 (RGB),它们按比例组合在一起可以产生任何颜色,就像电视摄像机在液晶显示器 (LCD) 屏幕上重现图像一样。
The array of pixels is represented in a computer as a two-dimensional matrix of pixels of different light intensity with a third dimension holding the primary colors typically red, green, blue (RGB) in proportions that can together generate any color, much like a television camera does in reproducing images on a liquid crystal display (LCD) screen.
然而,计算机视觉不仅仅是重现物体的图像以供观看,计算机的三维体积矩阵类似于人类的视觉皮层,构成了一个人工神经网络(ANN),可以通过人工神经元突触连接模式形成和存储物体的特征图,就像生物视觉皮层形成心理图像一样。
Computer vision, however, does more than just reproduce an image of the object for viewing, the computer's three-dimensional volume matrix, in analogy with the human visual cortex, constitutes an artificial neural network (ANN) that can form and store feature maps of objects by artificial neuron synaptic connection patterns just like the biological visual cortex forms a mental image.
然后,ANN 就像一个孩子学习识别物体一样,可以在标记数据集上进行监督训练。通过调整应用于人工神经元亮度水平的权重和偏差参数,使之更接近训练集亮度水平模式,从而最大限度地减少 ANN 的突触模式与训练数据集之间的差异,迭代减少人工神经网络层中的“误差”,以创建所观察物体的突触模式特征图,并将其存储和标记在 ANN 的内存中,以供后续识别。
ANN then, just like a child learning to recognize objects, can undergo supervised training on labeled datasets. The difference between ANN's synaptic patterns and the training dataset is minimized by adjusting weight and bias parameters applied to the artificial neuron brightness levels to more closely match the training set brightness level patterns, iteratively reducing the “error” in the artificial neural network layers to create synaptic pattern feature maps of the viewed object to be stored and labeled in ANN's memory for subsequent recognition.
在分类模式识别中,当一个物体被提交给 ANN 进行识别时,ANN 的标记特征图会与所查看的物体进行比较,然后对该物体进行分类。在聚类模式识别中,未标记的数据被提交给 ANN,然后 ANN 根据特征相似性对数据进行分组。
In classification pattern recognition, when an object is presented to ANN for recognition, ANN's labeled feature maps are compared with the viewed object and the object is classified. In clustering pattern recognition, unlabeled data is presented to ANN, who then groups the data in accord with feature similarities.
数据可以用向量表示,其分量是大小、形状、颜色等特征,其“接近度”由不同向量的内积(点积)决定。大多数模式识别算法都是基于统计推断进行“识别”,输出层列出了特定特征的识别概率。
The data can be represented by vectors with the components being the features such as size, shape, color, and so on, and their “closeness” determined by the inner (dot)product of different vectors. Most pattern recognition algorithms “recognize” based on statistical inference, and output layers list the probabilities of the recognition of specific features.
标记模板识别可用于识别相对简单、定义明确的整体对象,例如印刷的数字和字母表的字母,以及机器人组装的明确指定的组成部分,但即使只有轻微差异的对象也可能无法使用模板进行准确分类。
Labeled template recognition can be used for identifying relatively simple, well-defined whole objects, such as printed digits and letters of the alphabet, and clearly specified component parts for robotic assembly, but objects that are even only slightly different may not be classified accurately using templates.
对于更复杂的物体和场景的模式识别,边缘最容易检测,因为它们是从背景到物体的突然灰度变化,例如猫的耳朵;同样,对于突然的颜色变化,如躺在蓝色地毯上的橙色猫。
For pattern recognition of more complicated objects and scenes, edges are easiest to detect as they are an abrupt greyscale change in transitioning from the background to the object, for example a cat's ears; likewise for abrupt color changes like an orange cat lying on a blue rug.
然而,即使是突然的变化也可能被识别系统中纹理变化、划痕或电子不稳定等形式的噪声所掩盖。可以通过将像素亮度值替换为其自身及其相邻像素的平均值或中值来平滑此类噪声,从而消除噪声但保留对比度。
However, even abrupt changes can be obscured by noise in the form of variations in texture, scratches or electronic instability in the recognition system. Such noise can be smoothed by replacing the pixel brightness value with the average or median of itself and its neighbors, thus eliminating the noise but preserving the contrast.
深度感知是通过两个独立的相机进行立体成像而获得的,但这需要将每个探测器图像的对应点关联起来。这可以通过将灰度阵列简化为边缘图、扫描图以查找相似的外观以识别对应点、然后测量到每个相机图像平面的距离,并根据这些差异逐点重建三维图像来实现。
Depth perception is gained by two separated cameras for stereographic imaging, but this requires correlation of the corresponding points of each detector's image. This can be done by reducing the greyscale arrays to edge maps, scanning the maps for similar appearances to identify the corresponding points, then measuring the distance to each camera's image plane, and from those differences, reconstructing a three-dimensional image point-by-point.
这个过程看似非常复杂,需要对图像中的每个像素几乎立即进行多次计算,但这正是计算机所能做的,在这种情况下几乎与两只生物眼睛一样好。
A seemingly very involved process requiring almost instantaneous multiple computations for every pixel in an image, but this is just what a computer can do, in this case almost as well as two biological eyes.
识别的一个重要特征是物体的纹理;即像素亮度的规则模式,例如玉米粒中的标记(显著特征)的结构分析,以及猫毛等方向一致性的统计分析。通过像素强度级别与其邻近像素强度级别相似的统计概率,可以相对容易地检测标记和一致性。
An important feature for recognition is an object's texture; that is, regular patterns of pixel brightness, such as the structural analysis of tokens (salient features) like kernels in an ear of corn, and statistical analysis of directional coherence like a cat's fur. Tokens and coherence can be relatively easy to detect by means of the statistical probability that a pixel's intensity level will be similar to that of its near neighbors.
相机只需要三原色( “原色”是指三原色中没有一种颜色可以通过其他两种颜色混合而成),通常是红、绿、蓝 (RGB),因为每种原色的成分混合都可以产生任何颜色。这可以用一个简单的方程来表示,
A camera needs only three primary colors (“primary” meaning no one color of the primary three can be made from mixtures of the other two), typically red, green, and blue (RGB), because the admixture of components of each primary color can produce any color. This can be represented by a simple equation,
其中r、g和b是每种原色的分量。因此,在具有红、绿、蓝轴的三维坐标系中,任何颜色都可以用其 RGB 分量的值表示为3D 颜色空间中的点。
where r, g, and b are the component amounts of each primary color. So in a three-dimensional coordinate system with red, green, and blue axes, any color can be represented by a point in the 3D color space from the value of its RGB components.
颜色识别的主要困难之一是,其三个独立属性,色调、强度和饱和度,主要取决于物体的照明类型、角度和强度。要识别色调,计算机视觉必须首先确定强度和饱和度;强度可以取为三个 RGB 强度值的平均值,饱和度是颜色与照明的比率,但在不同的照明条件下,计算机视觉中准确区分颜色可能仍然很困难。1
One of the main difficulties of color recognition is that its three independent attributes, hue, intensity, and saturation, depend critically on the type, angle, and intensity of illumination on the object. To recognize hue, computer vision must first determine the intensity and saturation; the intensity can be taken as the average of the three RGB intensity values and the saturation is the ratio of color to illumination, however the accurate distinction of colors in computer vision may still be difficult under different lighting conditions.1
可以提取形状、纹理和颜色的组合(称为特征)来进行分类,例如,仅基于它们的不同形状和颜色对一篮水果中的草莓和香蕉进行分类,但将草莓与西红柿区分开来则需要尺寸尺度和纹理分析。
Combinations of shape, texture, and color called features can be extracted for classification of, for example, strawberries and bananas in a basket of fruits based solely on their different shape and color, but separating strawberries from tomatoes would require size scale and texture analysis.
对于更复杂的物体,必须利用结构关系进行识别,例如腿、尾巴、尖耳朵、毛皮、大圆眼睛,并且三角形的鼻子必须能够被识别为一只猫,但是不同的姿势,比如安详地坐着和双腿叉腰仰卧,仍然必须被归类为一只猫。
For more complex objects, structural relationships must be employed for recognition, for example the legs, tail, pointy ears, fur, big round eyes, and triangular nose must be to be identified as a cat, but different poses, such as sitting serenely and lying on its back with legs akimbo, must still be classified as a cat.
假设向人工神经网络展示了一只可爱的橙色短毛小狗,但是它的训练集中,在狗分类中主要由大丹犬和德国牧羊犬等大型犬组成,而它的猫分类包括许多可爱的橙色短毛小猫,那么 ANN 很可能会把狗误认为是猫。
Suppose an artificial neural network was presented with a cute little orange shorthair puppy, but its training set, in the dog classification primarily consisted of large dogs such as great danes and German shepherds, and its cat classification included many cute small orange shorthair kittens, then ANN may well misidentify the dog as a cat.
如果经过多次训练来扩大和细化狗的特征图之后,我们的小狗仍然被误认为是一只猫,那么 ANN 就有责任在小狗训练集上进行更多的监督学习。
If after many more training runs to enlarge and refine the dog feature maps, our puppy is still misidentified as a cat, it will be incumbent on ANN to undergo more supervised learning on puppy dog training sets.
或者,ANN 可以重新校准猫的特征图来强调明显可区分的差异,例如猫的粉红色鼻子与狗的黑色鼻子形成对比,因此鼻子的第三维颜色矩阵元素粉红色应该具有更大的权重,或者为猫手工设计粉红色鼻子颜色偏差。
Alternatively, ANN could recalibrate the cat feature maps to emphasize the apparently distinguishable differences, for example a cat's pink nose as opposed to a dog's black none, so the third dimension color matrix element pink for the nose should be more heavily weighted or a pink nose-color bias hand-engineered for cats.
通过更多的训练集,包括长得像狗的垂耳苏格兰折耳猫和长得像猫的萨摩耶犬,再加上使用手工设计的权重和偏差(比如对猫的粉红色鼻子有更大的偏差)进行更多的训练周期,然后就可以逐步做出更细致的区分,ANN 将能够更好地对猫和狗进行分类。
By more training sets including say dog lookalike floppy-eared Scottish Fold cats and cat lookalike Samoyed dogs, plus more training epochs with hand-engineered weights and biases (for example a greater bias towards cats in response to their pink noses, then progressively finer distinctions can be made, and ANN will be better able to classify dogs and cats.
电视人工神经网络中的隐藏层通过从一层到下一层的前馈人工神经元激活来联合提取独特特征,从而实现分层特征提取。在生物视觉系统中,皮质神经元被认为是二元激活(开/关),只有高于某个阈值的激活水平才会引发与其他神经元的突触连接。在人工神经网络中,初始激活水平可以与输入信号强度成比例,提供缩放信息,但激活水平通常是随机选择的,提供空白画布,当通过参数化进行调制时,可以更好地反映训练集数据。
The hidden layers in an artificial neural network conjunctively extract distinctive features by feedforward artificial neuron activation from one layer to the next for hierarchical feature extraction. In biological vision systems, cortical neurons are believed to be binary activated (on/off) and only activation levels above some threshold will instigate a synaptic connection to other neurons. In artificial neural networks, the initial activation levels can be proportional to the input signal intensity, providing scaled information, but the activation levels usually are randomly selected, providing a blank canvas that when modulated by parameterization, better reflects the training set data.
参数化通过增强人工神经元激活水平,或者相反,通过将激活水平乘以加权因子并在给定层中的激活水平总和中添加偏差,降低难以区分的神经元激活水平,为区别性特征提供了更大的意义。
Parameterization provides greater significance to distinctive features by enhancing artificial neuron activation levels, or conversely diminishing the indistinguishing neuron activation levels, by multiplying the activation level by a weighting factor and adding a bias to the sum of activation levels in a given layer.
感知器是一种人工神经元,它接受多个二进制(0或1)输入x i并产生单个二进制输出a(是= 1 或否= 0)。每个x i输入都乘以反映该输入重要性的权重w i,将加权输入相加,如果加权和小于或大于或等于某个选定的阈值偏差b ,则做出激活(是)或不激活(否)的决定a,如示意图1所示
A perceptron is an artificial neuron that takes multiple binary (0 or 1) inputs xi and produces a single binary output a (Yes = 1 or No = 0). Each xi input is multiplied by a weight wi reflecting the significance of that input, the weighted inputs are summed, and if the weighted sum is less than or greater or equal to some chosen threshold bias b, the decision a to activate (yes) or not (no) is made, as illustrated by the schematic figure,1
例如,你正在决定是否参加钢琴演奏会;虽然有几个决策因素,但这是一个二元决策,是或否。假设输入因素为:
For example, you are trying to decide whether to attend a piano recital; there are a few decisional factors but it is a binary decision, yes or no. Say the input factors are:
x 1 =距离较近,可以步行,
x 2 =女朋友和你一起去,
x 3 =女朋友的弟弟也来了。
x1 = close by so can walk,
x2 = girlfriend goes with you,
x3 = girlfriend's little brother comes too.
你分析了一下情况:你喜欢钢琴音乐,演奏会就在附近,你认为你的女朋友想和你一起去,但她可能会带她的弟弟一起去。在这种情况下,你还想去吗?有哪个因素起决定作用?如果没有,那么就需要更深入的分析。
You analyze the situation: You like piano music, the recital is close by, and you think your girlfriend would like to go with you, but she may bring her little brother along. In that case do you still want to go? Is any one factor decisive? If not, then a deeper analysis is required.
可以通过为每项输入因素赋予权重来确定其相对重要性;例如,没有车、钱不多,步行距离内很重要,因此w 1 = 3;你的女朋友来参加非常重要,因此w 2 = 5;但她的弟弟也来参加绝对是负面的,因此w 3 = –4。你喜欢钢琴音乐,因此参加的门槛很低(出于偏见)b = 6。
The relative importance of each input factor can be determined by attaching a weight to it; for instance, having no car and little money, within walking distance is important, so w1 = 3, your girlfriend coming is very important, so w2 = 5, but her little brother also coming is a definite negative, so w3 = –4. You like piano music so the threshold for attending is a low (for biases) b = 6.
因此,如果您可以步行去听音乐会,而您的女朋友会不带她的弟弟一起去,则加权和为1 × 3 + 1 × 5 + 0 × (–4) = 8 > 6,因此决策是去,您可以只和她一起步行去听演奏会。如果她(可能还有她的弟弟)不和您一起去,那么即使您可以步行去听演奏会,您也不会去,因为1 × 3 + 0 × 5 + 0 × (–4) = 3低于阈值。如果她会去但会带她的弟弟,那么您的决定会更加困难,但感知器会为您决定。加权和为1 × 3 + 1 × 5 + 1 × (–4) = 4 < 6,因此如果她的弟弟一起去,您就不会去。
Therefore, if you can walk to the concert and your girlfriend will come without her little brother, the weighted sum is 1 × 3 + 1 × 5 + 0 × (–4) = 8 > 6, and so the decision will be to go, you can walk with only her to the recital. If she (and presumably her little brother) will not come with you, then you won’t go even if you can walk to the recital since 1 × 3 + 0 × 5 + 0 × (–4) = 3 is below the threshold. If she will go but brings her little brother, your decision is more difficult, but the perceptron will decide for you. The weighted sum is 1 × 3 + 1 × 5 + 1 × (–4) = 4 < 6, so you will not go if her little brother comes along.
参数化显然有助于决策,但感知器也确定了一个关键因素,即小兄弟。通过找到一些基本事实,可以帮助你做出决定,例如,如果(希望)小哥哥那天晚上要上小提琴课。最糟糕的情况当然是你的女朋友不和你一起去,但她的弟弟会去,你就得付 Uber 的车费,而且演奏会上演奏的曲子都是无调性的。
Parameterization clearly helps decision-making, but the perceptron has also identified a critical factor, namely little brother. Your decision can be helped along by finding some ground truth, for instance if (hopefully) little brother has violin lessons that night. The worst case is of course your girlfriend does not come with you, but her little brother does, you have to pay for an Uber ride, and the pieces played at the recital are all atonal.
有趣的是,除了帮你决定是否去之外,感知器还揭示出一个推论:钢琴音乐对你来说可能并不那么重要;也就是说,你女朋友无拘无束的伴奏可能比你的任何文化自负都更重要。
Interestingly, in addition to deciding for you whether to go or not, the perceptron also reveals an inference that piano music may not be all that important to you; that is, your girlfriend's unfettered accompaniment is likely more important than any cultural pretensions you may have.
如果你发现音乐会上有你特别喜欢的作曲家的作品,你就可以降低参加的门槛,也就是参加的几率会增加,甚至小弟弟跟着去也值得。
If you find out that the recital will have pieces by composers that you particularly like, you can lower the threshold towards attending; that is, the bias towards attending is increased, perhaps even to the extent of little brother tagging along being worth it.
因此,感知器是一种复杂的决策装置,通过分配权重来操作,这些权重表示决策因素的相对正面和负面重要性,并使用阈值偏差来反映决策的重要性。
The perceptron therefore is a sophisticated decision-making device operated by assigning weights that signify the relative positive and negative significance of decisional factors, with a threshold bias reflecting the importance of the decision.
人工神经网络中的这种参数化是通过将人工神经元层视为感知器、加权每个神经元的激活水平并偏置层中的总和来增强或减弱该神经元层对后续层神经元激活的影响来实现的,最终在网络中形成代表训练集数据特征的突触模式。
This kind of parameterization in artificial neural networks is done by treating the artificial neuron layer as a perceptron, weighting each neuron's activation level and biasing the sum in a layer to enhance or diminish the effect of that neuron layer on the activation of the succeeding layer's neurons to ultimately form synaptic patterns in the network representing features in the training set data.
在上面的感知器例子中,权重和偏差是根据个人的偏好分配的,但是,一个表面上没有偏好的机器如何针对任何特定情况确定权重和偏差呢?
In the perceptron example above, the weights and biases were assigned according to an individual's preferences, but just how are the weights and biases determined for any given situation by a machine which ostensibly has no preferences?
权重的初始值可以任意指定,因为它们将通过参数化进行调整,参数化将迭代调整权重的值以匹配标记的训练数据集,从而构成学习过程。初始化时通常使用正态(高斯)分布的随机值,以避免由于初始权重相同而导致后续学习到的不同对象相似。
The initial values of the weights can be arbitrarily assigned because they will be tuned by a parameterization that will iteratively adjust the values of the weights for matching a labeled training data set, thereby constituting a learning process. Random values of a Normal (Gaussian) distribution are typically used in initialization so as to avoid subsequent learning of different objects being similar because of the same initial weightings.
从数学上描述,网络层 (0) 中所有神经元的加权激活水平都连接到人工神经网络的后续层 (1) 中的每个神经元,并通过权重进行调制;例如,后续层 (1) 中第0个神经元的激活由第i行和第j层权重w i,j乘以前一个第0层的j 个神经元的激活水平之和给出,2
Described mathematically, the weighted activation levels of all the neurons in layer (0) of the network are connected to every neuron in the succeeding layer (1) of the artificial neural network and modulated by weights; for example, the activation of the 0th neuron in the succeeding layer (1) is given by the sum of the ith row and jth layer weights wi,j times the activation levels of the preceding 0th layer's j neurons,2
因此,权重就像网络中相邻层神经元之间连接强度的度量。因此,可以看出,输入层中的第一个突触连接强度模式完全基于随机激活水平,通过调整权重和偏差来细化,以最小化与训练集数据的差异。隐藏层分层特征提取将进一步将特征与其周围环境相吻合。
The weights thus are like a measure of the strength of the connections between the neurons in adjacent layers in the network. As such, it can be seen that the first synaptic connection strength pattern in the input layer, based solely on random activation levels are refined by adjusting the weights and biases in accord with minimization of differences with the training set data. Hidden layer hierarchical feature extraction will further dovetail the characteristic features and their surroundings.
为了进一步提高图像特征的辨别能力,可以在加权激活水平的总和中添加不同的偏差(b0),例如正偏差以确保神经元层中的加权和对不同特征的特征提取有有意义的贡献,或者相反,负偏差可以淡化或完全忽略不相关的特征和噪声,
To further improve image feature discrimination capability, different biases (b0), can be added to the sum of weighted activation levels, for instance positive biases to ensure that the weighted sum in a neuron layer will meaningfully contribute to the feature extraction of distinct features, or conversely negative biases to downplay or totally ignore irrelevant features and noise,
光学显示器不使用生物皮层神经元的二进制“开”(1)或“关”(0)或区间(0,∞)内光强度的无限线性增加,而是使用 0 到 1 之间的十进制灰度值来表示激活强度。由于范围内的小数可以无限小,因此人工神经元激活水平可以非常精细但仍然可以区分(尤其是通过计算机)。此外,权重和偏差的微小变化将正确地在特征图中产生微小变化,并且不会无意中导致梯度下降中的导数以显着性反转翻转,加权和和偏差的较大变化也不会导致梯度下降反向传播发散。
Instead of the binary “on” (1) or “off” (0) of biological cortical neurons or the unlimited linear increase of light intensity in the interval (0, ∞), optical displays use a greyscale of decimal values between 0 and 1 for activation intensities. Because the decimal fractions within a range can be infinitesimally small, the artificial neuron activation level can be extremely fine yet still distinguishable (particularly by a computer). Furthermore, small changes in weights and biases will properly produce small changes in the feature maps, and will not inadvertently cause the derivatives in gradient descent to flip in reversals of significance, nor will large changes in weighted sums and biases cause the gradient descent backpropagation to diverge.
灰度由S 型函数(也称为逻辑函数)生成,该函数将变量x限制在域 (0, 1) 内,以在渐近 1 和 0 之间产生可数无限个可能值,
Greyscale is generated by the sigmoid function (also called the logistic function) that restricts variables x to the domain (0, 1) to produce a countably infinite number of possible values between the asymptotic 1 and 0,
为了表达模式识别中使用的概率,S 型函数输出必须为正,因为不存在负概率;因此,由于指数函数e没有负值,因此它非常适合提供正灰度值。3
To express the probabilities used in pattern recognition, the sigmoid function output must be positive because there is no such thing as a negative probability; so since the exponential function e has no negative values, it is ideally suited to provide positive greyscale values.3
S 型函数还可以表示阶跃函数的平滑版本,因此它也可以区分“是/否”,具体取决于x的值为负数还是正数,如上图中σ(x)的图所示,因此可以将对象分离到图的左侧和右侧,因此可以用于对象分类,例如区分猫和狗、相关电子邮件和垃圾邮件,以及在社交网络中对嘻哈音乐和古典音乐爱好者进行分类。
The sigmoid function can also represent a smoothed version of a step function, so it can also “Yes/No” differentiate, depending on the value of x as either negative or positive as can be seen by the plot of σ(x) in the figure above, and thus objects can be separated into the left and right sides of the graph, and so can be used for object classification, such as distinguishing cats from dogs, germane from spam emails, and classifying hip-hop and classical music aficionados in social networks.
一层中一个神经元的 S 型函数对神经元激活水平权重和偏差进行如下操作:
The sigmoid function for one neuron in one layer operates on the neuron activation level weights and biases as
神经网络的每一层都有多个神经元,因此从前一层的神经元接收突触灰度激活的给定神经元本身具有激活水平,该激活水平可以通过权重矩阵乘以前一层神经元激活水平向量,再加上偏差向量,然后由 S 型函数对整个套件进行操作来表示。
There are multiple neurons in each layer of a neural network, so a given neuron receiving synaptic greyscale activation from a preceding layer's neurons itself has an activation level that can be represented by a multiplication of a weights matrix times the preceding layer neuron activation level vector, plus the bias vector, and then the whole kit operated on by the sigmoid function.
例如,在我们的简单4×4神经网络中,矩阵形式中第 (1) 层神经元的灰度激活水平作为第 (0) 层神经元激活水平的函数,由下式给出:
For example, the greyscale activation levels for the neurons in layer (1) as a function of the neuron activation levels of layer (0) in our simple 4 × 4 neural network in matrix form is given by,
其中a (1)是一个列向量,表示第 1层的所有加权神经元激活水平,用权重矩阵的 S 型函数乘以前一层的神经元激活水平加上每个加权向量和的偏差来表示。该方程可以简洁地写成:
where a(1) is a column vector representing all the weighted neuron activation levels in layer 1 expressed in terms of the sigmoid function of the weights matrix times the previous layer's neuron activation levels plus the biases for each weighted vector sum. The equation then can be written compactly as,
其中W是加权矩阵,b是层 (0) 中偏差的列向量,sigmoid 函数对向量Wa (0) + b (0)进行运算,产生第 1 层神经元的加权灰度激活a (1)。这个优雅的方程明确地显示了后续层神经元激活对前一层所有神经元的加权神经元激活水平及其偏差的依赖关系。
where W is the weighting matrix, and b is a column vector of biases in layer (0), with the sigmoid function operating on the vector Wa(0) + b(0) to produce the weighted greyscale activation a(1) of the neurons in layer 1. This elegant equation explicitly shows the dependence of succeeding layer neuron activations on the weighted neuron activation levels and their biases of all the neurons in the preceding layer.
从这个角度看,每个神经元层都是一个感知器,排列在一个多层网络中,这个网络恰如其分地被称为多层感知器(MLP) 模型,这是最早的现代人工神经网络。每个神经元都有一个输入,该输入由来自前一层的加权和偏置神经元激活水平以及它自己的合成激活水平组成,该合成激活水平随后将通过权重和偏置进行调制,以所谓的前馈模式输入到下一层的神经元中。
Seen in this way, each neuron layer is a perceptron arrayed in a multilayered network called appropriately enough a Multilayer Perceptron (MLP) model, the earliest modern artificial neural network. Each neuron has an input comprised of the weighted and biased neuron activation levels from the preceding layer, and a resultant activation of its own, which will be subsequently modulated by weights and biases for input to the neurons in the next succeeding layer in the so-called feedforward mode.
就 S 型函数而言,第 (1) 层神经元的激活为:
The activation of layer (1) neurons in terms of the sigmoid function is,
然后,必须像上述那样处理每一层神经元的激活,并且由于人工神经网络可能具有许多层和神经元,以及大量的训练集数据,尽管计算负担很重,但这种网络是非常可编程的,因为计算机非常擅长矩阵的线性代数,并且使用可以同时计算矩阵运算的大规模并行图形处理单元(GPU),计算速度很快。
The activation of every layer's neurons then must be processed as above, and since artificial neural networks may have many layers and neurons, together with voluminous training set data, even though computationally burdensome, such a network is eminently programmable because computers are very good at the linear algebra of matrices, and computations are fast using massively-parallel graphical processing units (GPUs) which can compute the matrix operations simultaneously.
如此艰巨的计算表明了为什么人工智能直到大容量存储、超高速并行处理计算机的出现,以及利用这些硬件的高效学习和识别算法的诞生才得以真正腾飞。
Such daunting calculations show why artificial intelligence could not really take off until the advent of mass-storage, ultra-fast parallel-processing computers, and the creation of efficient learning and recognition algorithms taking advantage of that hardware.
除了由此而产生的新硬件和创新的学习算法之外,人工智能还通过MatLab、Octave和Numpy等免费开源计算软件, ImageNet等可以自由获取的训练集,以及GitHub和RedHat等免费公开使用和包容性系统开发的主机编码平台得到了更迅速、更全面的发展,所有这些都可以通过互联网方便地供任何人使用。
In addition to the new hardware and innovative learning algorithms made possible thereby, artificial intelligence has been developed more rapidly and comprehensively by free open source computational software from MatLab, Octave, and Numpy, freely accessible training sets such as ImageNet, and the free public use and inclusive system development of host-computer coding platforms such as GitHub and RedHat, all conveniently available to anyone on the Internet.
随着21世纪人工智能资源的去中心化(民主化)和共享化(社区化),人工智能在经历了20世纪90年代的“人工智能寒冬”之后,得以迅速、广泛地蓬勃发展。
With the 21st Century's decentralization (democratization) and sharing (communization) of artificial intelligence resources, rapid widespread AI development could proceed vigorously after the AI Winter of the 1990s.
早期的 AI 系统中通常使用S 型函数和双曲正切函数 ( tanh ),但最近开发的系统使用更简单的整流线性单元( ReLU ) 函数,其定义为:
The sigmoid function and the hyperbolic tangent function (tanh) were commonly used in early AI systems, but more recently-developed systems use the simpler Rectified Linear Unit (ReLU) function defined as,
这只是一条45度直线,将所有负激活值更改为0,并在网络中使用softmax函数,将提供灰度。这个简单的函数显然比 sigmoid 和双曲正切更容易计算,因此处理速度更快,同时与其他灰度函数相比,准确度没有显著差异。
which is just a straight 45o line that changes all the negative activations to 0, and with a softmax function in the network, will provide greyscale. This simple function is obviously easier to compute than sigmoid and hyperbolic tangent and therefore faster to process while making no significant differences in accuracy compared to other greyscale functions.
此外,由于当 sigmoid 和 tanh 神经元激活水平输出接近0或1时,学习梯度趋近于零,因此学习将严重减慢或停止(饱和)。开放式 ReLU 神经元不会饱和,因此学习不会减慢,但是如果 ReLU 神经元的最终加权输入为负,则梯度消失(趋近于零),因此 ReLU 神经元也将停止学习总的来说。这些缺陷催生了大量的替代灰度转换函数,可以根据特定 AI 任务的需要临时使用。4
Furthermore, since the learning gradient approaches zero when the sigmoid and tanh neurons activation level outputs are near either 0 or 1, learning will severely slow down or stop (saturate). An open-ended ReLU neuron will not saturate, so the learning will not slow down, however if the final weighted input to a ReLU neuron is negative, the gradient vanishes (goes to zero), and so the ReLU neuron will also stop learning altogether. These imperfections have spawned a plethora of alternate greyscale conversion functions which can be employed ad hoc as needed for particular AI tasks.4
因此,在 sigmoid、tanh、ReLU 和其他函数中进行选择是一项练习,首先要用最简单的实现方式解决手头的任务,然后当问题出现时,通常通过反复试验找到最佳的纠正函数和程序。当需要提高分类和预测准确性时,可以调用算法调整、不同的成本函数来增强收敛、超参数化等,同时要考虑计算负担。程序选择通常是根据哪种方法最有效而做出的,而不一定完全理解为什么某些方法比其他方法更有效。
Choosing among sigmoid, tanh, ReLU, and others thus is an exercise in first addressing the task at hand in its simplest implementation, and then as problems arise, finding the best corrective functions and procedures, usually by trial and error. When improved classification and prediction accuracy are required, algorithm tuning, different cost functions to enhance convergence, hyperparameteration, and so can be called on, all the while keeping in mind computational burden. Procedural choices often are made by just going with what works best and not necessarily completely understanding why something works better than something else.
人工神经元激活水平的参数化为动态调整突触模式以提取特征提供了一种手段,但究竟如何调整权重和偏差,使得人工神经网络能够学习,从而表现出人工智能呢?
The parameterization of the artificial neuron activation levels provides a means for dynamic adjustment of synaptic patterns for feature extraction, but how exactly are the weights and biases adjusted so that the artificial neural network can learn, and thus exhibit artificial intelligence?
一个n 人工神经网络 (ANN) 必须经过训练才能识别物体和数据,年轻的 ANN 正在学习如何通过呈现图像和数据集进行识别。神经元激活水平最初由权重和偏差的正态分布参数化,因此她的神经网络是一张空白画布,她已准备好进行机器学习。1
An artificial neural network (ANN) must be trained to recognize objects and data, young ANN is learning how to recognize by being presented with training sets of images and data. The neuron activation levels have been parameterized initially by a Normal distribution of weights and biases, her neural network is thus a blank canvas, she is ready to machine learn.1
她最初的随机人工神经激活水平分布和训练集数据之间的差异自然会很大,她的学习任务就是减少这种差异,以便提取训练集图像和数据的特征,并将其存储在内存中,以便识别呈现给她的图像和数据。该差异由损失函数量化,表示为L层决策行向量神经元输出的激活水平与训练集y向量2之间的差异的平方和
The difference between her initial random artificial neural activation level distribution and the training set data naturally will be quite large, her learning task then is to reduce that difference in order to extract features characteristic of the training set images and data to store in memory in order to recognize images and data presented to her. That difference is quantified by a Loss Function, expressed by the sum of the squares of the differences between the outputted activation levels of the decisional row vector neurons in layer L and the training set y vector,2
就像人工智能中所使用的那样,对所有的损失取平均值对训练集中所有m 个样本进行函数计算得出平均成本函数;它包括对训练集中所有样本i求平均的网络中的所有权重和所有偏差。
As employed in artificial intelligence, taking the average of all of the Loss Functions over all the m samples in the training set gives the Average Cost Function; it comprises all the weights and all the biases in the network averaged over all the samples i in the training set.
平均成本函数是经济学中的一个术语,表示低效生产的成本,因此C可以被认为是与训练集匹配的错误成本,所以平均成本函数的最小化将减少 ANN 的神经元激活模式和训练集之间的差异。
The Average Cost Function is a term taken from economics as the cost of inefficient production, thus C can be thought of just the Cost of Being Wrong in regard to matching the training set, so the minimization of the Average Cost function will reduce the differences between ANN's neuron activation patterns and the training set.
计算平均成本函数不会像依次计算每个训练集示例的损失函数那样繁重,并且有人指出,取平均值将有助于平均掉数据中可能存在的噪声和不相关因素。
Calculating the Average Cost Function will not be as computationally burdensome as calculating the Loss Function of each training set example in turn, and it is noted by some that taking the average will beneficially average out noise and irrelevant factors that may be in the data.
平均成本函数定义为,3
The Average Cost Function then is defined as,3
回想一下第 15章
where recall from Chapter 15 that
在初等微积分中,可以轻松地最小化单个变量的任何连续可微函数,方法是对该变量求导数并将其设置为零,以找到斜率水平的位置(即零斜率),这是凹曲线的最小点。
In elementary calculus, minimization of any continuous and differentiable function of a single variable is easily done by taking the derivative with respect to that variable and setting it equal to zero to find where the slope is horizontal (that is zero slope), which is the minimum point of a concave curve.
用C表示任何成本函数,二次函数C是连续且可微的,因此可以最小化,例如关于加权因子变量w,
Taking C to represent any Cost Function, a quadratic C is continuous and differentiable, so it can be minimized, for example with respect to a weighting factor variable w,
要找到权重变量w 的最小点,请随机选择曲线上的一个点,并确定要朝哪个方向移动才能到达Cw曲线上的最小点。这个方向是由所选点的曲线斜率符号给出;类比为在平均成本函数曲线上滚动的球,如果斜率为正,它将向左滚动,如果斜率为负,它将向右滚动,速度取决于斜率的严重程度。如果滚动速度不是太快,它将在水平斜率最小处停止,从而确定最小成本。
To find the minimum point with respect to a weighting variable w, randomly choose a point on the curve, and determine what direction to move in order to reach the minimum point on the C-w curve. This direction is given by the sign of the slope of the curve at that chosen point; the analogy is a ball rolling on the Average Cost Function curve, it will roll left if the slope is positive and right if the slope is negative, and with a speed depending on the severity of the slope. If not rolling too fast, it will stop at the horizontal slope minimum, thereby determining a minimum Cost.
这给出了一个线索,即最好使用向量来描述此操作,因为它既有大小又有方向。从平均成本函数与加权变量曲线上随机选择的点开始,随着接近最小点,由于斜率的绝对值减小,滚动球的速度减慢,学习算法可以相应地减小球的步长,以防止无意中超过最小点。
This gives the clue that this operation would be best described using a vector since it has both magnitude and direction. From a randomly chosen point on the Average Cost Function versus weighting variable curve, as the minimum point is approached, since the absolute value of the slope decreases and the rolling ball slows down, the learning algorithm can accordingly reduce the step size of the ball to prevent inadvertently overshooting the minimum point.
不幸的是,关于变量w 的成本曲线可能有许多凹度,因此有多个局部最小值,也包括凸曲线最大值(斜率也为零),如下图左所示,并且除非找到全局最小值(即平均成本函数的绝对最小值,而不仅仅是局部最小值),否则成本不会关于特定权重变量最小化。在寻找全局最小值时遇到局部最小值的问题自人工智能研究诞生以来就一直困扰着它。
Unfortunately, the Cost curve with respect to the variable w may have many concavities and therefore multiple local minima and also include convex curve maxima (that also have zero slope) as shown in the figure below at left, and the Cost will not be minimized with respect to a particular weighting variable unless the global minimum, meaning the absolute minimum of the Average Cost Function, and not just a local minima, is found. The problem of encountering local minima in the search for the global minimum has plagued artificial intelligence research since its inception.
如果要确定两个变量权重的成本函数的最小化,则将不再使用Cw曲线,而是使用成本函数与权重的三维轮廓线,如下图右所示,其中有一个滚动球在寻找轮廓线的最小值,
If the minimization of the Cost Function for two variable weights is to be determined, instead of C-w curves there will be a three-dimensional contour of Cost Function versus weights as shown in the figure below at right, with a rolling ball searching for the minimum of the contour,
只需查看计算机生成的Cw图,就可以找到二维和三维模型的局部和全局最小值,但是人工神经网络通常具有很多权重和偏差参数,因此成本将是很多变量的函数,并且任何大于三维的轮廓都是不可能可视化的。
Local and global minima for two-dimensional and three-dimensional models can be found by simply looking at the computer-generated C-w plots, but artificial neural networks usually have many, many weight and bias parameters, so the Cost will be a function of many, many variables, and any contour of greater than three-dimensions is impossible to visualize.
虽然人类无法在三维以上进行物理可视化,但他们可以在概念上用心灵之眼进行观察,幸运的是,计算机可以在几乎无限的维度上运行,因此可以通过查找向量分析的梯度来计算任意数量变量的成本函数最小化。梯度是一个表示最陡下降方向的向量,其幅度是多维轮廓陡度的度量,为了找到最小值,它将是一个梯度下降,因此是一个负梯度向量,
Although humans cannot physically visualize in greater than three dimensions, they can conceptually view with the mind's eye, and fortunately the computer can operate in almost infinite dimensions, and so the computation of Cost Function minimization with respect to any number of variables can be performed by finding the gradient of vector analysis. The gradient is a vector representing the direction of steepest descent, with a magnitude that is a measure of the steepness of the multidimensional contour, and for finding a minimum it will be a gradient descent and thus a negative gradient vector,
以三维加权变量为例,对成本函数C进行梯度运算,
The gradient operating on the Cost Function C in an example for just the weighting variables in three dimensions is,
其中i,j,k分别是x,y和z方向的单位向量, w i分别是这些方向的权重。
where i, j, k are the unit vectors in the x, y, and z directions respectively and wi are the weights in those directions respectively.
在成本函数轮廓上滚动的球将滚向最小值,尽管无法直观地看到,但大于三维的多维轮廓仍然可以用相同的球类比在数学上描述,因此,在由单位“方向”向量 e j 指定的“方向”上,依赖于多维人工神经网络中所有权重参数的梯度原则上总是可以计算为,
A ball rolling on the Cost Function contour will roll towards minima, and although it is impossible to visualize, a greater than three multi-dimensional contour still can be described mathematically with the same ball analogy, so a gradient dependent on all the weight parameters in an artificial neural network of many dimensions j in “directions” designated by the unit “directional” vectors ej in principle can always be calculated as,
多元微积分是一个很好的例子,它说明了数学如何拓宽一个人的思维,使其超越感官所能感知的范围。也就是说,“看到就是相信” 是有帮助的,但在数学中既不必要也不完整,因为“看到”实际上意味着概念上的沉思。
Multivariate calculus is a good example of how mathematics can broaden one's mind beyond what can be perceived by the senses. That is, “seeing is believing” is helpful, but neither necessary nor complete in mathematics where “seeing” really means conceptual contemplation.
这就是为什么数学确实可以成为智力的试金石的原因之一,因为没有人会怀疑,无限维希尔伯特空间的复共轭向量微积分的构造和理解需要不少的脑力劳动。4
This is one of the reasons why doing mathematics may indeed be the touchstone of intelligence, for no one can doubt that the construct and comprehension of the complex conjugate vector calculus of infinite-dimensional Hilbert space requires no little grey matter.4
多元微积分中计算成本函数最小化的算法通过在梯度方向上迈出一步来计算多维轮廓上某一点的梯度向量,并一遍又一遍地重复计算梯度向量,直到找到最小值。步长的大小称为学习率,快速学习的步长较大将加速收敛过程并减少计算负担,但步长过大可能会超出最小值。5
The algorithm to compute the minimization of the Cost Function in multivariate calculus computes the gradient vector at a point on the multidimensional contour by taking a step in the gradient direction, and repeating the calculation of the gradient vector over and over again until you find a minimum. The size of the step is called the learning rate, a large step for fast learning will hasten the process to convergence and reduce computational burden, but too large a step may overstep a minimum.5
成本函数的梯度下降将涉及对 S 型函数(或其他灰度转换)σ、权重矩阵W、前一层激活级别a (L−1)和前一层偏差b (L−1)求导,从而得出导致成本函数最快最小化的权重和偏差值。下一章将展示反向传播的计算。
The gradient descent of the Cost Function will involve taking the derivatives of the sigmoid (or other greyscale conversion) function σ, the weighting matrix W, the previous layer activation levels a(L−1), and the previous layer biases b(L−1), altogether giving the values of the weights and biases that will cause the most rapid minimization of the Cost Function. The calculations are shown in the next Chapter for backpropagation.
最终调整后的权重和偏差的符号有利于成本最小化,表示调整幅度较高 (+) 或较低 (-),而权重和偏差的相对幅度则揭示哪种调整对降低成本的影响最大,换句话说,成本对特定参数调整的敏感程度。
The sign of the resulting adjusted weights and biases promoting Cost minimization indicates a higher (+) or lower (–) adjustment, and the relative magnitudes of the weights and biases reveal which of the adjustments will have the greatest impact in reducing the Cost, in other words, how sensitive the Cost is to a particular parameter adjustment.
因此,梯度下降惊人地编码了每个权重和偏差的相对重要性,以最小化错误并准确反映标记的训练样本,实际上是教会人工神经网络如何正确地看待训练集数据所代表的内容。
The gradient descent therefore amazingly encodes the relative significance of each weight and bias towards minimizing the error and accurately reflecting a labeled training sample, in effect teaching the artificial neural network how to properly regard the training set data for what it represents.
由于梯度的导数显示了梯度的变化方式,随着其绝对值的减小,它必须接近最小值,因此 Cost 的二阶导数可以排列在Hessian 矩阵中,由此可以计算出二阶偏微分方程关于权重和偏差的标量场的行列式。由于行列式是正定的,因此可以用它来测试极值;如果 Hessian 行列式为正,则为最小值;如果为负,则为最大值;如果为零,则为双曲抛物面的鞍点。
Since the derivative of the gradient shows how the gradient is changing, as its absolute value decreases, it must be nearing a minimum, so the second derivative of the Cost can be arrayed in a Hessian matrix, from which the determinant of a second-order partial differential equation with respect to the scalar field of weights and biases can be calculated. Because the determinant is positive definite, it can be used to test for extrema; a minimum if the Hessian determinant is positive, and if negative, it is a maximum, and if zero, a saddle point of a hyperbolic paraboloid.
值得庆幸的是,计算二阶导数的繁重负担可以通过一些省力技术来解决,例如梯度下降可以改变速度(已经是导数)而不是位置,从而避免了大量的二阶导数计算,通过采用具有摩擦力的动量来降低速度,因为以面向物理的方法从曲线上的不同点以小步骤接近最小值。6
The fearsome burden of calculating second derivatives thankfully has some labor-saving techniques such as the gradient descent acting to change the velocity (already a derivative) instead of position which avoids the large second-order derivative calculations by taking a momentum with friction reducing the velocity as minima are approached in small steps from different points on the curve in a physics-oriented approach.6
梯度下降和/或 Hessian 会找到遇到的第一个最小值,但当该最小值的斜率为零时会停止,本质上停止学习过程,再次陷入局部最小值坑。此时,需要再次从该点开始计算以找到下一个最小值,希望是一个全局最小值,而不仅仅是另一个局部最小值。
Gradient descent and/or the Hessian will find the first minimum encountered, but will stop when the slope at that minimum is zero, essentially stopping the learning process, once again foundering on a local minimum hollow. At this point, it will be necessary to start the calculation from that point once again to find the next minimum, hopefully a global minimum instead of just another provincial local minimum.
电视梯度下降最小化成本函数的过程采用了简单的最小化演算。但是,人工神经网络 (ANN) 在每一层都形成突触连接模式,并将它们传递到下一层。因此,梯度下降必须参考前几层在人工神经网络的各个层中进行。这是通过在网络中向后执行梯度下降来实现的,这个过程称为反向传播,本质上是一个反馈回路,通过计算前一层的每一层的梯度下降来调整权重和偏差,以最小化成本函数。
The process of gradient descent to minimize the Cost Function employs the simple minimization calculus. But the artificial neural network (ANN) is forming synaptic connection patterns at each layer and passing them on to the succeeding layer. Therefore, gradient descent must operate throughout the layers of the artificial neural network with reference to preceding layers. This is done by performing gradient descent going backwards through the network, in a process called backpropagation, essentially a feedback loop adjusting the weights and biases to minimize the Cost Function through calculation of the gradient descent at each layer in terms of the preceding layer.
通过改变权重和偏差,可以增加或减少每个人工神经元的激活水平对于匹配训练集数据的重要性,并且由于前一层神经元的激活水平将影响给定层神经元的激活水平,因此通过神经网络逐层反向传播并按照最小化错误成本的方式递归调整每个前一层的权重和偏差参数,原则上最终将使 ANN 突触模式与训练集数据模式相匹配。
The significance of the activation level of each artificial neuron towards matching the training set data can be increased or decreased by changing their weights and biases, and since the activation levels of the preceding layer neurons will affect the activation levels of a given layer's neurons, going backwards through the neural network layer-by-layer and recursively adjusting the weight and bias parameters in each preceding layer in accord with minimizing the Cost of Being Wrong, in principle will ultimately match the ANN synaptic pattern to the training set data pattern.
由于每次运行都需要相当大的计算能力,训练数据通常被分成小批量,并按验证集和测试集的迭代随机梯度下降方式依次运行,以增强网络匹配训练集数据的准确性。
Since every run requires considerable computational power, the training data typically is divided into mini-batches and run in turn as iterative stochastic gradient descents with validation and test sets to enhance the network's accuracy in matching the training set data.
梯度下降反向传播基于非线性偏微分方程的高斯-牛顿数值分析计算称为牛顿法,其中“牛顿”指的是艾萨克·牛顿提出的数值求导数。反向传播过程基于微分学的基本链式法则,我们将会看到这一点。
Gradient descent backpropagation is based on the Gauss-Newton numerical analysis computation of non-linear partial differential equations called Newton's Method where “Newton” refers to numerically taking the derivatives which Isaac Newton conceived. The process of backpropagation is based on the fundamental chain rule of differential calculus, as will be seen.
第 16 章给出了给定人工神经网络层L的损失函数,其中a (L)是层L中行神经元的激活向量,y是训练集向量,
The Loss Function for a given artificial neural network layer L was given in Chapter 16 where a(L) is the activation vector of the row neurons in layer L and y is the training set vector,
由于这将是每个层L的错误成本,为了避免与表示网络层的L混淆,并进一步与 AI 文献保持一致,我们将使用C表示每层的成本函数。据了解,第 16 章中所有训练集示例的平均成本函数由以下公式指定。
Since this will be the Cost of Being Wrong for each layer L, to avoid confusion with the L designating the network layer, and further to be in accord with the AI literature, we will use C for the Cost Function per layer. It is understood that the Average Cost Function over all the training set examples of Chapter 16 is designated by .
正如数学推导中经常做的那样,变量的改变可以让事情变得更简单,因此根据层L中的权重w (L)和偏差b (L)以及前一层 ( L-1 ) 中神经元的激活水平a (L -1 ) 定义一个新变量 z (L ),
As is often done in mathematical derivations, a change of variable makes life easier, so define a new variable z(L) in terms of the weights w(L) and the biases b(L) in layer L, and the activation level of the neurons a(L–1) in the previous layer (L–1),
回想一下,神经元的神经元激活水平,其在L层中的权重和偏差通过 S 型函数运算转换为灰度,
Recall that the neuron activation level of a neuron, its weight and bias in layer L is converted to greyscale by operation of the sigmoid function,
首先计算成本对加权因子变量变化的敏感度,,利用微积分的基本链式法则,这是反向传播思想的数学基础,我们可以写出,1
To first compute the sensitivity of Cost to the change in weighting factor variable, , the fundamental chain rule of calculus is used, which is the mathematical basis of the idea of backpropagation, we can write,1
从最后一项开始对每一项取导数(S 型函数上的撇号σ表示取导数,如图所示),
Taking the derivatives of each term starting with the last term (with a prime on the sigmoid function σ means taking the derivative as shown),
最后一个方程表明, z (L)相对于L层权重w (L)的变化取决于前一层神经元的激活强度a (L-1);也就是说,就像在生物大脑的突触模式中一样,一起激发的神经元是连接在一起的,在人工神经网络的反向传播中,一起激发的人工神经元是链接在一起的。因此,
The last equation says that the change in z(L) with respect to the weight w(L) in layer L depends on the activation intensity of the neuron in the preceding layer, a(L–1); that is, as in the synaptic patterns of biological brains, the neurons that fire together are wired together, and in the backpropagation of artificial neural networks, the artificial neurons that fire together are chained together. So,
现在对给定层L中m 个训练集样本的权重成本进行平均,
Now average the Costs with respect to the weights in a given layer L over the m training set examples,
这给出了梯度下降向量一个元素的数据集的平均值,其中包括给定层权重的所有偏导数。
This gives the average of a dataset for one element of the gradient descent vector which includes all the partial derivatives of the weights of a given layer.
重复上述过程,计算成本随偏差的变化率给出,
Repeating the process for the rate of change of Cost with respect to the biases gives,
但根据之前给出的新变量z (L)的定义,
But from the definition of the new variable z(L) given before,
且由于其他条件已在上文确定,
and since the other terms have been determined above,
并回想一下
and recall that
现在只需重复该过程,逐一迭代所有层,以依次最小化相对于各层的权重和偏差的成本。
Now just repeat the process iterating backwards through all the layers, one-by-one to minimize the Cost with respect to the weights and biases of the layers in turn.
要考虑层中的每个神经元,只需在项中为a添加行下标,为w添加两个下标(分别表示第j行和第k列),这样2
To consider each and every neuron in the layers, just add row subscripts to the a's and two subscripts (for row j and column k) to the w's in the terms so that2
并且j行神经元的成本将是m 个训练集示例的总和,
and the Cost over the j rows of neurons will be the sum over m training set examples,
链式法则表达式现在是,
The chain rule expression is now,
然后对所有不同层上的上述表达式求和。这与单个神经元示例相同,但与L-1层中的激活相关的成本除外;也就是说,由于层中有多个神经元, L-1神经元会通过多种不同的路径影响成本,因此必须将它们全部加起来。
Then sum the above expression over L for all the different layers. This is the same as for the single neuron example, except for the Cost with respect to the activations in Layer L-1; that is, the L-1 neurons influence the Cost through multiple different paths because of the multiple neurons in the layers, and they must be all added up.
对给定层执行这些导数将根据最小化成本来调整相对于前一层的权重和偏差,并且人工神经元的激活水平(希望)将收敛以匹配训练集数据。
Performing these derivatives for a given layer will adjust the weights and biases in relation to the preceding layer in accord with minimizing the Cost, and the activation levels of the artificial neurons (hopefully) will converge to match the training set data.
表面上看,这些计算量很大,但所有这些计算都可以通过免费访问的软件计算程序(例如在 GitHub 上找到的程序)有效地完成。3
Ostensibly a great deal of computation, all these calculations can be efficiently performed by freely accessible software computational programs such as those found on GitHub.3
总之,链式法则给出了确定梯度下降向量每个分量的导数的表达式,通过在最陡的斜率上反复向下穿过网络层,通过依次调整每层中的权重和偏差,通过在网络中向后移动,朝着最小化成本的方向移动。
In summary, the chain rule gives expressions for the derivatives that determine each component of the gradient descent vector by repeatedly stepping downhill through the network layers on the steepest slope towards the minimization of the Cost by adjusting weights and biases in each layer in turn by going backwards through the network.
为了理解当今的人工智能,必须认识到,人工神经网络并没有被明确告知要学习训练集数据的哪些特征或如何学习这些特征,网络可以自行学习,因为它懂微积分;也就是说,人工智能机器知道最小化技术以及如何应用微积分的链式法则来反向传播误差。
For understanding today's artificial intelligence, it is critical to realize that the artificial neural network was not specifically told what features of the training set data to learn or how to learn those features, the network learned all by itself because it knows calculus; that is, the AI machine knows the minimization techniques and how to apply the chain rule of calculus for backpropagating an error.
这就是自下而上的人工智能的本质,人工智能机器并非自上而下地被编程来执行特定任务或识别特定事物;相反,在其隐藏层内,通过其算法、梯度下降和反向传播,机器可以根据图像和数据执行并得出结论。换句话说,一旦经过训练,人工智能机器就可以自主行动,然后通过强化和无监督学习,它可以学习更多并自行提高其能力。
This is the essence of bottom-up artificial intelligence, the AI machine was not programmed to perform specific tasks or recognize particular things from the top-down; rather within its hidden layers and through its algorithms, gradient descent and backpropagation, the machine can perform and arrive at conclusions based on images and data. In other words once trained, the AI machine can act autonomously, and then through reinforcement and unsupervised learning, it can learn more and improve its capabilities on its own.
电视人类的视觉皮层被分割成小簇神经元,每个神经元对视野中的特定特征或区域做出反应,这些所谓的感受野随后被合并,构成神经网络中的整个心理图像。
The human visual cortex is segmented into small clusters of neurons that are responsive to specific features or sectors in the visual field, these so-called receptive fields are then merged to constitute the entire mental image in the neural network.
与视觉皮层类似,计算机视觉人工神经网络接收从感兴趣的物体反射的光,并将光子转换为电信号,激活网络输入层矩阵中的人工神经元。
In analogy with the visual cortex, a computer vision artificial neural network receives light reflected from the object of interest and converts the photon to electrical signals that activate artificial neurons in the network's input layer matrix.
在多层感知器 (MLP) 神经网络中,所有相邻层中的神经元都完全连接,因此无法直接挑选出并聚焦视野中的特定区域进行仔细检查。
In a multilayer perceptron (MLP) neural network, all the neurons in adjacent layers are fully connected, and therefore cannot directly pick out and focus on specific areas of the visual field for closer examination.
卷积神经网络(CNN)的隐藏层并非完全连接,而是与类似于人类视觉皮层的接受域的矩阵滤波器的较小窗口交错。在典型的 CNN 中,第一个卷积滤波器扫描输入矩阵层,随后的卷积滤波器跨过前一个隐藏层挑选出特征以帮助构成所观察对象的特征图。
The hidden layers of a convolutional neural network (CNN) are not fully connected, but rather are interleaved with smaller windows of matrix filters akin to the receptive fields of the human visual cortex. In a typical CNN, a first convolutional filter sweeps over the input matrix layer, and succeeding convolutional filters stride over the preceding hidden layer picking out features to help constitute a feature map of the viewed object.
卷积只是两个函数的数学运算,以产生第三个函数,该函数表示函数如何结合。在二维中,映射函数f(x, y)的卷积是通过在 f(x, y) 上放置一个卷积滤波器h ( x , y )来实现的,在空间上逐步遍历它并对结果进行积分,
A convolution is simply a mathematical operation of two functions to produce a third function that represents how the functions are conjoined. In two dimensions, a mapped function f(x, y), is convolved by figuratively placing a convolving filter h(x, y) over f(x, y), spatially stepping through it and integrating the result,
二重积分是在过滤器的x乘以y区域上进行的,该区域横跨相同大小的层扇区,所示的位移是h(x + 1,y + 1)过滤器在f(x, y)上的 + 1步(通常从左上角开始,先向右,然后向下),因为它在输入矩阵f(x, y)的连续相同大小的扇区上逐步前进,提取与过滤器大小相同的特征图矩阵。过滤器类型、大小、起点和步长可根据手头的映射任务进行选择。
The double integral is over the filter's x times y area spanning a layer sector of the same size, the displacement shown is a +1 step (typically starting from the upper left-hand corner and first to the right, then down) of the h(x + 1,y + 1) filter over the f(x, y) as it strides step-by-step over successive same-sized sectors of the input matrix f(x, y), extracting feature map matrices of the same size as the filter. The filter type, size, starting point, and step size can be chosen to fit the mapping task at hand.
卷积过程结合了对两个相邻矩阵的向量行取内积然后进行积分,这是矩阵内向量汇合程度的度量。
The convolution process combines taking the inner product of the vector rows of two adjacent matrices and then integrating, which is a measure of the degree of confluence of vectors within the matrix.
两个向量的内积(或点积)定义为向量的大小乘以它们之间角度θ的余弦。对于喜欢篮球的人来说,点积可以用球从圆弧顶部接近篮筐的轨迹向量来表示,其中θ是球的向量与穿过篮筐的垂直轴向量之间的角度。因此,如果球沿着篮筐的轴线垂直落下(θ = 0 °和cosθ = 1),它会“看到”篮筐的整个圆形区域,从而拥有最大尺寸的得分目标。如果球水平落下(θ = 90 °和cosθ = 0),它只能看到篮筐的边缘,不可能得分。如果球以 60° 的角度落下(相对“平坦”的投篮),cosθ = 0.5,它只能看到一半的区域,因此垂直落下的投篮概率只有一半。一般来说,球与垂直线的夹角越小,球穿过篮筐的可能性就越大,该角度的余弦值决定了篮球“看到”的目标篮筐的大小。
The inner (or dot) product of two vectors is defined as the magnitudes of the vectors times the cosine of the angle θ between them. The dot product can be illustrated for those who like basketball by the trajectory vector of the ball approaching the basket from the top of its arc with θ being the angle between the vector of the ball and the vertical axis vector through the hoop. So if the ball is coming vertically down along the axis of the hoop (θ = 0° and cosθ = 1), it “sees” the full circular area of the hoop and thus has the maximum-sized target for a score. If the ball comes in horizontally (θ = 90° and cosθ = 0), it sees only the edge of the basket, and there is no possibility of a score. If the ball comes in at an angle of 60° (a relatively “flat” shot), cosθ = 0.5, it sees only half the area, so it has only half the chance of a vertically-falling shot. Generally, the smaller the angle of the ball with the vertical, the more likely the ball will go through the hoop as measured by the cosine of the angle that determines the size of the target hoop as “seen” by the basketball.
当然,将球几乎垂直地向上投掷,使其几乎垂直地穿过球网,需要过大的力量,而且由于轨迹较长,更难控制,因此,权衡利弊,假设 45° 的角度将产生 0.707 个篮筐面积,这比假设 75° 的平面轨迹要好得多,因为 75° 只能产生 0.26 个篮筐面积的目标。1
Of course, shooting a ball almost straight up so it falls almost vertically through the net takes inordinate strength and because of the long trajectory is more difficult to control, so trading-off, an angle of say 45° will give a 0.707 hoop area which is much better than a flat trajectory of say 75° which gives only a 0.26 hoop area target.1
由此可见,点积是两个向量并集大小作为标量值的度量。内积是点积在多维向量空间的推广,是将多维矩阵中行向量的对应元素相乘,再将乘积相加,得到构成新矩阵的多个向量并集的标量度量。
From this, it can be seen that the dot product is a measure of the magnitude of the union of the two vectors as a scalar value. The inner product is a generalization of the dot product to multidimensional vector space, obtained by multiplying the corresponding elements of the row vectors in a multidimensional matrix and summing the products to produce a scalar measure of the union of the multiple vectors constituting a new matrix.
对矩阵的内积取二重积分会产生一个面积汇合,可以测量矩阵中向量的收敛程度,就像两条流动的河流汇合一样,这个术语由此而来。
Taking the double integral over the inner product of the matrices produces a confluence over area that measures the converging of the vectors in the matrices, just as in the merging of two flowing rivers, from whence the term came.
跨过人工神经网络矩阵层的卷积滤波器可以看作是一个滑动内积,它提取滤波器和矩阵层之间的汇合处,进行检测、增强或抑制,从而提取特征。
A convolutional filter striding over an artificial neural network matrix layer can be seen as a sliding inner product that extracts the confluences between the filter and the matrix layer, detecting, augmenting, or dampening, and thereby extracting features.
当过滤矩阵(也称为核或窗口矩阵)像手电筒光束一样在矩阵层上滑动时,它会逐行卷积矩阵“照亮”区域的激活水平。突出的特征将得到增强,因为具有较高权重激活水平的矩阵层元素将通过内积汇合和过滤器的较高权重激活水平元素进行肯定卷积。较弱的特征将被负卷积,因为矩阵层中低权重激活水平的小或负权重激活水平的内积将很小或为负。
As a filter matrix (also called a kernel or window matrix) glides over a matrix layer like a flashlight beam, it convolves the activation levels of the “illuminated” regions of the matrix, row by row. Prominent features will be enhanced because the elements of the matrix layer having higher-weighted activation levels will be affirmatively convolved by the inner product confluence and the higher-weighted activation level elements of the filter. Weaker features will be negatively convolved because of the inner product of small or negatively-weighted activation levels of the low-weighted activation levels in the matrix layer and the filter will be small or negative.
双重积分产生一个总和,该总和通过将滤波器矩阵的行向量与层的覆盖部分的内部积相加来计算,以产生具有共享权重和共享偏差的单个激活级别,该激活级别在位于新卷积层的注册部分中心的单个目标像素中注册。权重和偏差共享方案大大减轻了计算负担,并且由于将卷积区域的所有共享权重和偏差卷积到单个目标像素,因此还可以减少噪声。
The double integral produces a sum calculated by adding the inner products of the row vectors of the filter matrix and the covered sector of the layer to produce a single activation level having shared weights and a shared bias registered in a single destination pixel that is positioned in the center of the registered section of the newly convolved layer. The weights and bias sharing scheme greatly reduces the computational burden, and may also reduce noise because of the convolution of all the shared weights and biases of the convolved area to a single destination pixel.
下图以示意图形式显示了一个滤波器作用于输入矩阵层,通过行向量内积计算在新的卷积层上产生目标像素的过程。2
The process for one filter acting on an input matrix layer with the row vector inner product computation to produce a destination pixel on a new convolved layer is shown schematically in the figure below.2
可以依次使用许多不同的过滤器来生成许多特征图,并且可以通过在新的卷积层中逐个生成目标像素来组合所有特征图,从而生成可以展平为决策向量的最终卷积层矩阵。
Many different filters may be employed one after the other to produce many feature maps, and all of the feature maps can be combined by producing the destination pixels in new convolved layers one by one, producing a final convolved layer matrix that can be flattened into a decisional vector.
因此,卷积神经网络使用过滤器来描绘和定位突出的特征、抑制似是而非的特征并抑制噪声以产生一系列卷积层,从而实现更快、更精细的特征提取。
A convolutional neural network thus delineates and positions prominent features, retrocedes specious features and dampens noise using filters to produce a series of convolved layers, allowing a faster and more refined feature extraction.
在监督学习中,典型的训练集输入可能是二维480 × 480像素图像,呈现给 CNN 最初“空白”的随机权重和偏差输入层。3 × 3窗口矩阵过滤器扫描训练集输入矩阵以获取特定特征,例如边缘、曲线和颜色。
In supervised learning, a typical training set input might be a two-dimensional 480 × 480 pixel image presented to the CNN's initially “blank” input layer of random weights and biases. A 3 × 3 window matrix filter scans the training set input matrix for specific features such as edges, curves, and colors.
过滤器可以由输入层矩阵本身的一部分、特定特征过滤器(例如Sobel Gx边缘过滤器)或正态(高斯)或其他元素值的随机分布组成,这些元素值本身将通过输入层矩阵上的多次滑动进行训练。但是,不同的过滤器应使用不同类型的初始化,以避免增加过滤器提取相同特征的概率。
The filter can be composed of a sector of the input layer matrix itself, a specific feature filter such as the Sobel Gx edge filter, or a Normal (Gaussian) or other random distribution of element values that itself will be trained through multiple slides over the input layer matrix. However different filters should use different kinds of initialization to avoid increasing the probability that the filters extract the same features.
当 CNN 运行训练集时,就像人工神经元层一样,过滤器也会从训练集中学习。因此,过滤器可以提供自下而上的特征信息,而手工设计的过滤器(可能表达先入之见)可能无法提取或定位这些信息,尤其是在对象或图像具有不寻常或意外特征的情况下。
As the CNN runs through the training sets, just like the artificial neuron layers, the filters also learn from training sets. As such, the filters can provide bottom-up feature information that hand-engineered filters (likely expressing preconceptions) may not be able to extract or locate, particularly in the case of objects or images with unusual or unexpected features.
由于特定特征的平移或旋转不变性(移动或旋转猫,它仍然是猫),过滤器可以将特定特征映射到后续卷积层的不同位置,从而提供特征之间的空间关系。
Because of the translational or rotational invariance of a specific feature (move or rotate a cat, and it's still a cat), filters can map specific features into different positions on succeeding convolutional layers, thereby providing spatial relationship among the features.
此外,如果对图像进行裁剪、翻转、旋转、颜色变化、模糊等修改,并与原始图像进行比较,可以增强和提取重要特征并删除噪声。
Furthermore, if an image is modified by crops, flips, rotations, color changes, blurring, and the like, and compared with the original, significant features can be enhanced and extracted with noise deleted.
例如,Google Brain的简单对比学习自监督算法(SimCLR)与下游分类器一起应用于识别肺部医学成像中的 covid-19 病毒。
For example, Google Brain's simple contrastive learning self-supervised algorithm (SimCLR) together with a downstream classifier was applied to identify the covid-19 virus in medical imaging of lungs.
当分层特征图完成后,它将被投影到最终的卷积层上,该卷积层将被展平(矩阵被矢量化)为完全连接的决策列向量。展平矩阵是通过将矩阵列向量元素端到端连续堆叠以形成(非常)长的列向量来实现的。3
When the hierarchical feature map is completed, it is projected onto a final convolutional layer which is flattened (the matrix is vectorized) into a fully connected decisional column vector. Flattening a matrix is achieved by successively stacking the matrix column vectors elements end-to-end to form a (very) long column vector.3
正如其名称所示,softmax函数是一个最大函数,它通过将单调递增的整数驯服到区间 (0, 1) 来软化ReLU,以提供灰度,
A softmax function, true to its name, is a max function that softens a ReLU by taming monotonically increasing integers into the interval (0, 1) to provide greyscale,
其中z i是向量z的第i个元素,ξ(z) i通过将每个指数函数除以指数函数之和来进行归一化,以确保值介于 0 和 1 之间,符合概率的要求;ξ(z) i不能为负,因为指数函数不能为负。
where zi is the ith element of the vector z, and ξ(z)i is normalized by dividing each exponential function by the sum of the exponential functions to ensure that the values are between 0 and 1 as required for probabilities; ξ(z)i cannot be negative because the exponential function cannot be negative.
全连接输出层是一个N维列向量,其中每个元素表示输入图像属于某一类的概率,例如在宠物图像的分类中,它在显示圆眼睛、粉红色的特征图中具有高值三角形的鼻子和尖尖的耳朵。全连接列向量显示概率,例如,将家养宠物分类排列在输出向量中,4
The fully connected output layer is an N-dimensional column vector, wherein each element represents the probability that the input image is of a certain class, for instance in the classification of pet images, it will have high values in the feature maps that show round eyes, pink triangular noses, and pointy ears. The fully connected column vector displays probabilities for instance for the classification of household pets arrayed in the output vector as,4
假设决策向量是
and suppose the decision vector is
那么呈现给 CNN 的图像很可能就是一只猫。
then the image presented to the CNN is most likely a cat.
Conv2D滤波器通常用于前几个卷积层,以提取高级特征,并且通常堆叠在每个卷积层中。初始层并行卷积不同大小的滤波器,从最精确的(1 × 1)细节到更大的(5 × 5)滤波器,从而在覆盖更大区域的同时访问精细细节。
Conv2D filters are typically used in the first few convolutional layers to extract high-level features, and are usually stacked in each convolutional layer. An inception layer convolves different sizes in parallel, from the most accurate (1 × 1) detailing to bigger (5 × 5) filters, thereby accessing fine detail while covering a larger area.
例如,Google 的癌症肿瘤诊断和GoogleNet赢得计算机视觉ImageNet 大规模视觉识别挑战赛时都使用了 Inception 层。
Inception layers were used for example in Google's cancer tumor diagnosis and in GoogleNet's winning the computer vision ImageNet Large-Scale Visual Recognition Challenge.
检测、分割和定位场景中的物体需要通过其他物体和周围环境的噪声来区分物体。在这些更复杂的信息传输问题中,使用诸如边界框隔离和丰富的特征层次结构进行多类识别等技术可以更准确地重建整个场景。5
Detecting, segmenting, and locating an object in a scene requires distinguishing the object through the noise of other objects and the immediate surroundings. In these more complicated information transmission problems, using techniques such as bounding boxes to isolate, and rich feature hierarchies for multiple class recognition, allows more accurate re-construction of the entire scene.5
当然,这其中也存在着一些权衡,多个滤波器和许多卷积层会大大增加人工神经网络的大小和计算负担。然而,所有这些用于卷积神经网络特征提取的计算都可以通过 Python TensorFlow 和 PyTorch 等免费提供的软件包计算软件非常高效地完成。
There is of course a trade-off, multiple filters and many convolutional layers greatly increase the size of the artificial neural network and the computational burden. However, all of these calculations for convolutional neural network feature extraction can be performed very efficiently by packaged computational software freely accessible from, among others, Python TensorFlow and PyTorch.
一个在最小化成本函数时遇到的常见问题是,梯度下降有时会在通过隐藏层反向传播时突然减慢甚至完全停止,永远无法最小化成本,因此停止学习。
A common problem encountered in the minimization of the Cost Function is that the gradient descent at times will suddenly slow down and even stop altogether while backpropagating through the hidden layers, never minimizing the Cost, and therefore ceasing to learn.
通过最小化二次代价函数进行学习可能会很慢,因为当神经元激活非常错误时,Cost 会很高,而权重和偏差参数的调整会很耗时,或者因为最小化方法在面临巨大误差时已经超出了范围,永远无法迭代地克服误差来降低 Cost 以匹配训练集。
Learning by minimizing a quadratic cost function may be slow because when the neuron activation is very wrong, the Cost is very high, and the weights and bias parameter adjustments will take time, or because the minimization method has gone beyond the pale when confronted with a massive error, never able to iteratively overcome the error to lower the Cost to match the training set.
要克服这个问题,就需要一个成本函数,当遇到较大误差时,它可以加速梯度下降。交叉熵成本函数基于信息论中的“意外度量”;也就是说,如果人工神经网络确定神经元的激活水平为a (L) = 0,标记目标是y = 1,则交叉熵会记录一个“意外”,并将其作为信息熵处理,必须快速降低信息熵,以便网络收敛以准确反映训练集数据。
To overcome this problem requires a cost function that can accelerate gradient descent when encountering a large error. The Cross-Entropy Cost Function is based on a “measure of surprise” in information theory; that is, if a neuron's activation level is determined by an artificial neural network to be a(L) = 0, and the labeled target is y = 1, the cross-entropy registers a “surprise” that is dealt with as information entropy that must be rapidly reduced in order for a network to converge to accurately reflect the training set data.
热力学熵是衡量系统中无序性或不确定性的指标,对于自发变化(当系统上的约束被移除时发生的变化),系统的无序性将始终自然增加。这在日常生活中很常见,例如在行走时,如果鞋带松动并开始脱落,说明它们已经脱离了牢固系紧的束缚,随着你继续行走,随着你的行走系统不断增加能量,你的鞋带会进一步松开。你的鞋带永远不会在你行走时奇迹般地重新系好,回到更有序的状态,而是你走得越久,鞋带就会变得越松。在你的行走系统中,鞋带已经从更有序的系紧状态转变为更无序的松开状态,你的行走系统的熵增加了。
Thermodynamical entropy is a measure of disorder or uncertainty in a system, and for spontaneous changes (what happens when a constraint on the system is removed), the disorder of the system will always naturally increase. This is encountered in everyday life as for example while walking, if your shoelaces loosen and start to disengage, they have been released from the constraint of being securely tied, and as you continue walking, your shoelaces will further unravel as your walking system continues to add energy. Your laces will never miraculously re-tie themselves as you walk, returning to a more ordered state, but rather the longer you walk, the looser your shoelaces become. Within your walking system, the shoelaces have gone from the more-ordered tied state to the more disordered untied state, and the entropy of your walking system has increased.
我们可以直接观察到熵的作用,例如,在你的办公桌下面的一排计算机电缆中;无论最初排列得多么仔细,在下一次观察时,它们都会以某种神秘的方式恶化为无可救药的高熵无序纠缠。
Entropy in action can be directly observed for instance in an array of computer cables below your desk; no matter how carefully initially arranged, at the next observation, they have all somehow mysteriously deteriorated into a hopeless tangle of high entropy disorder.
这种熵的自发增加可以用统计力学中状态发生的概率来解释。举例来说,如果抛出五枚硬币,全部正面或全部反面出现的概率非常低,但四枚正面和一枚反面(或反之亦然)出现的概率要大五倍,因为五种不同的正面和一枚反面的排列方式满足标准(因为有五枚不同的硬币可能为奇数)。对于三枚正面(反面)和两枚反面(正面),每种情况都有十种满足标准的情况(3-2 或 3-2),因此每种情况发生的可能性都是十倍。因此,全正面或全反面的高度有序状态只有一种状态,而三枚正面和两枚反面的更无序状态有十种可能的状态。因此,高度无序的状态更有可能出现,因为有更多可用的状态。
This spontaneous increase of entropy can be explained by the probability of occurrence of states in statistical mechanics. To illustrate, if five coins are tossed, the probability of coming up all heads or all tails is very low, but the probability of four heads and one tail (or vice-versa) is five times larger because five different arrangements of heads and one tail satisfy the criterion (since there are five different coins which can be the odd-out). For three heads (tails) and two tails (heads), there are ten cases each that satisfy the criterion (3-2 either way) and thus each is ten times as likely to occur. So the highly-ordered state of all heads or all tails has only one state, and the more disordered state of three heads and two tails has ten possible states. Therefore, the high-disorder state is much more probable because there are more available states.
如果抛出 100 枚硬币,正面和反面的不同组合总数约为 10 30,因此全部出现正面的概率几乎为零,为 1/10 30。对于典型的一摩尔(6 × 10 23 个分子)气体分子系统,系统可能的自由度相应巨大,分子会自然地散布到有更多可用位置的地方。这就是为什么我们永远不会看到容器中的气体分子自发地聚集在一个角落里。
If 100 coins are tossed, the total number of different combinations of heads and tails is about 1030, and thus the probability of coming up all heads is practically nil at 1/1030. For typical molecular systems of one mole (6 × 1023 molecules) of gas, the possible degrees of freedom of the system are correspondingly enormous and the molecules will naturally spread out to where there are more available positions. That is why one never sees the molecules of a gas in a container spontaneously congregate in a corner all bunched together.
这也是为什么你继续走路时松开的鞋带不会自己重新系好,而你的电脑电缆总是一团糟的原因。系好的鞋带和分开的平行电缆是更有序的状态,自由度更少,而松开的鞋带和缠结的电缆则有许多可能的不同无序状态,因此可能性更大。
And that is also why your untied shoelaces will not re-tie themselves as you continue walking and your computer cables are always in a tangled mess. The tied shoelace and the separated, parallel cables are more ordered states with fewer degrees of freedom, while the untied shoelace and tangled cables have many possible different unordered states and are thus overwhelmingly more probable.
这也解释了为什么在您期望出现的地方找不到您想要的东西时会感到沮丧;您的太阳镜可能出现的地方比它恰好出现的地方要多得多。
This also explains the frustration of never finding what you are looking for in places where you expect something to be; the possible places where your sunglasses might be is much greater than the one place where it happens to be.
当然,重新系好鞋带和解开电脑电缆会让它们变得更有序,但代价是,由于必须耗费能量来重新排列这些系统,你的身体系统(包括精神痛苦的热量)的混乱程度会增加,而且它们永远不会完全回到原来的状态(热力学不可逆性)。结果是,即使你重新排列了系统,耦合系统(你和鞋带和电缆)的总熵仍然会增加。
Of course, re-tying your shoelaces and disentangling your computer cables makes them more orderly, but the price to pay is increasing disorder in your body system (including the heat of psychic anguish) from having to expend energy to reorder those systems, and they will never be exactly back to the same original state (thermodynamic irreversibility). The result is that even if you re-order the system, the total entropy of the coupled systems (you and the shoelaces and cables) has still increased.
热(能量)自然自发转移的方向优先性,用热力学第二定律通俗的表述就是“能量转换的方向总是从较热的地方向较冷的地方”。比如,把一壶冷水放在炉头上,并不会导致水所具有的少量热量被转移,从而使炉头的火焰变得更热。水壶的哨声预示着热力学第二定律的正确性。
The direction priority of the natural spontaneous transfer of heat (energy) is popularly expressed by the Second Law of Thermodynamics as “the direction of energy transformation is always from a hotter place towards a colder place”; for example, putting a kettle of cold water on the stove burner will not result in what little heat the water has being transferred to make the burner flame hotter; the truth of the Second Law is heralded by the kettle whistle.
对于每一个自发事件,热力学时间之箭只会向前飞翔,而由于自然事件被认为是不可逆的,它们总是朝着更大的无序方向发展;例如,如果你刚烤好的苹果派从厨房桌子上滑落到地板上,你不会看到它自发地重新组合并飞回桌面。
With every spontaneous event, the thermodynamical arrow of time flies only forward, and as natural events are deemed irreversible, they always proceed towards the greater disorder; for example, if your freshly baked apple pie slid off the kitchen table and splattered on the floor, you will not see it spontaneously reconstitute itself and fly back up to the tabletop.
因此,熵存在于任何系统(无论是自然的还是人工的)中,尤其是通信中。信息熵是从某个数据源随机产生信息的平均速率;事件传达的信息量是一个随机变量,其与期望值的差异由其信息熵量决定。低信息熵意味着传达的信息接近期望值,例如通过附近的无线基站进行的清晰语音传输。
Entropy therefore is present in any system, natural or artificial, and in particular communications. Information entropy is the average rate at which information is produced stochastically from some source of data; the amount of information conveyed by an event is a random variable whose difference from its expectation value is determined by its amount of information entropy. A low information entropy means that the information conveyed is close to the expectation value, for example a clean voice transmission over a near-by wireless base station.
当在高概率目标向量的元素中发现低概率事件时,误差很大,并且该事件比发现高概率事件时携带更多的无序信息,因为后者事件与目标元素的匹配程度更高;也就是说,正确预测的事件携带的信息熵较少,而非预期事件更加无序,携带的信息熵也更多。
When a low probability event is found in an element of a high probability target vector, the error is large, and that event carries more disordered information than when a high probability event is found because the latter event more closely matches the target element; that is, correctly predicted events carry less information entropy, and unexpected events are more disordered and carry more information entropy.
该信息熵可以用惊讶度单位来量化,高信息熵会向观察者发出惊讶信号,首先它会被注意到是一种异常,其次它的熵必须通过梯度下降反向传播来降低才能与期望值相匹配。
This information entropy can be quantified in units of surprisal, with high information entropy registering a surprise to the observer so that first of all it will be noticed as an anomaly, and secondly that its entropy must be decreased by gradient descent backpropagation in order to match the expectation value.
例如在棒球比赛中,投手击出本垒打,以及在足球比赛中,守门员进球,都是低概率事件,但信息熵较高,意外程度与预期相反。
For example in baseball, a pitcher hitting a home run, and in football, a goalie scoring a goal, are low probability events with a high information entropy surprisal that goes against expectations.
信息论起源于电子通信,其中干净的信号是期望值。“通信”是指“通过传输编码信号来识别来自源的数据”;信息熵为编码信号中数据的准确表达的最短可能平均长度提供了绝对限制,如果源的熵小于传输通道的熵容量,则该信息被视为“无损”通信。
Information theory was conceived from electronic communications where clean signals are the expectation values. “Communication” means “the identification of data from a source by means of the transmission of an encoded signal”; information entropy provides an absolute limit on the shortest possible average length of accurate expression of the data in the encoded signal, and if the entropy of the source is less than the transmission channel's entropy capacity, the information is deemed a “lossless” communication.
对于人工智能来说,通过人工神经网络学习正确识别图像或解释数据只是进行无损通信的一种练习。因此,降低神经网络中的信息熵可以更准确地识别输入数据,而这正是交叉熵成本函数的设计目的。
For artificial intelligence, learning the correct recognition of an image or interpretation of data by an artificial neural network is just an exercise in performing lossless communication. Therefore decreasing the information entropy in a neural network results in more accurate recognition of input data, and this is just what the cross-entropy cost function is designed to do.
交叉熵成本函数可以简单描述,尽管详细的数学理论很复杂。对于M = 2二元分类(是,否),交叉熵成本函数为:
The of cross-entropy cost can be simply described, although the detailed mathematical theories are complicated. For an M = 2 binary classification (Yes, No) the cross-entropy cost function is,
其中y是结果二进制指标(0, 1),p是预测概率,方括号中的表达式只是结果乘以预测概率和其唯一替代方案(在本例中为唯一其他可能性的概率)的总和。方括号前面的减号表示成本(或信号损失)的负数。
where y is the resultant binary indicator (0, 1) and p is the predicted probability with the expression in square brackets being just the sum of the resultant times the predicted probability and its only alternative in this case, the probability of the only other possibility. The minus sign in front of the square brackets indicates the negative for cost (or signal loss).
对于多类分类M > 2(例如,对猫、狗、金鱼和马进行分类),交叉熵成本函数是每个观察o的每个类标签c的单独成本之和,
For multiclass classification M > 2 (for example, classifying cats, dogs, goldfish, and horses), the cross-entropy cost function is the sum of the separate cost for each class label c per observation o,
其中,如果类标签c是观测值o的正确分类,则y o,c为二元指示符(0, 1),而p o.c为预测的观测值o属于类c 的概率。1
where yo,c is the binary indicator (0, 1) if the class label c is the correct classification for the observation o, and po.c is the predicted observation probability that o belongs to class c.1
交叉熵成本函数的信息论熵基础可能看起来很复杂,但与许多人工智能构造一样,其实现仅涉及编写一个表示上述方程的简单程序或下载现成的软件(例如XENT Cross Entropy ),让其运行,并查看进展情况(如果有的话)。2
The cross-entropy cost function's information theory entropy underpinnings may appear complex, but as in many artificial intelligence constructs, the implementation just involves writing a simple program representing the above equations or downloading canned software, such as XENT Cross Entropy, letting it run, and seeing how well things progress, if at all.2
在很多人工智能开发案例中,“如果可行,就继续”的工程精神胜过“知道为什么和如何”的物理学精神。然而,如果一个人想发明新的东西,或者从根本上改进旧的东西,就必须理解该技术的理论基础,这一点依然是正确的。
In many cases of artificial intelligence development, the engineering ethos of “if it works, go ahead” trumps the “knowing why and how” of physics. However, it is still true that if one wants to invent something new, or fundamentally improve something old, the theoretical bases of the technique will have to be understood.
但事实上,人工神经网络最近的许多成功都是通过实验、反复试验和启发式方法实现的,这种方式完全是功利性的,可以推广,正如人工智能先驱 Yann LeCun 所表达的那样:3
But it is a fact that many of the recent successes of artificial neural networks have come about simply through experiment, trial-and-error, and heuristics, in a completely utilitarian manner that can be generalized, as expressed by the artificial intelligence pioneer Yann LeCun:3
你必须意识到我们的理论工具非常薄弱。有时,我们对某种技术为什么有效有着很好的数学直觉。有时我们的直觉最终是错误的……问题变成了:我的方法对这个特定问题的效果如何,以及它对哪些问题效果好。
You have to realize that our theoretical tools are very weak. Sometimes, we have good mathematical intuitions for why a particular technique should work. Sometimes our intuition ends up being wrong .... The questions become: how well does my method work on this particular problem, and how large is the set of problems on which it works well.
一个卷积神经网络可以通过利用学习率、步幅、填充和池化等超参数来调整,以加速收敛并避免数据过拟合或欠拟合;也可以通过调整卷积层的分辨率和维度大小来减少计算负担。在实践中,经验通常是选择超参数化及其组合的最佳指导。
A convolutional neural network can be tuned to accelerate convergence and avoid overfitting or underfitting data by utilizing hyperparameters such as learning rate, stride, padding, and pooling; the computational burden can also be reduced by adjusting the resolution and dimension size of the convolutional layers. In practice, experience is often the best guide in choosing hyperparameterizations and combinations thereof.
学习率η通过指定梯度下降的步长来调整梯度下降的速度;步长越大,学习速度越快,但步长过大可能会跳过最小值。
A learning rate η adjusts the speed of gradient descent by specifying the size of the steps of the gradient descent; bigger step sizes produce speedier learning, but too big a step size may skip over a minimum.
成本函数关于m 个权重w k的转置(上标T )梯度向量,
The transposed (superscript T) gradient vector of the Cost function with respect to a number of m weights wk,
如果变量wk相对于梯度的微小变化被学习率η所考虑,
if a small change in the variable wk with respect to the gradient is factored by the learning rate η,
权重迭代如下:
and the weights are iterated as,
这只是一个梯度下降的过程,其中可调节的学习率因子η指定下降步长。调整步长并因此调整收敛速度也可用于稳定成本函数的梯度下降,以避免减速或停止。反复试验最常用于找到最佳梯度下降步长以实现最佳学习。
this then is just the process of gradient descent with an adjustable learning rate factor η specifying the descent step size. Adjusting the step size and thus the speed of convergence can also be used to stabilize the gradient descent of the cost function to avoid slowdown or stoppage. Trial and error is most often used to find the best gradient descent step size for optimal learning.
步幅是卷积滤波器在矩阵上滑动时的步长,可以增加步幅以减少接收场重叠并更快地覆盖矩阵层,同时减少计算负担。但是,如果步幅太大,滤波器可能会跳过或误解某些特征。
The stride is the step size of the convolutional filter as it slides over the matrix, it can be increased to reduce receptive field overlap and produce faster coverage over the matrix layers while concomitantly reducing computational burden. However, if the stride is too large, the filter may skip over or misinterpret some features.
由于特征图与过滤器的大小相同,它们将小于输入层的大小,因此可以用零填充特征图矩阵的边界,以确保过滤器和步幅能够成功与卷积层矩阵匹配。
Since the feature maps are the same size as the filters, they will be smaller than the size of the input layer, so the feature map matrix can be padded with zeros around the borders of the matrix to ensure that the filter and stride will successfully register with the convolved layer matrix.
就这些超参数而言,卷积层的高度/长度维度的输出大小由以下公式给出:
The output size in height/length dimensions of a convolutional layer in terms of these hyperparameters is given by,
其中输出大小是层输出矩阵的高度/长度,W是输入矩阵的高度/长度尺寸,K是过滤器尺寸比长度/高度,P是填充,S是步幅。
where Output Size is the height/length of the layer output matrix, W is the input matrix height/length dimension, K is the filter dimension ratio length/height, P is the padding and S is the stride.
超参数和滤波器维度的选择在很大程度上取决于手头的计算机视觉任务和计算负担的考虑。
The choice of hyperparameters and filter dimensions will depend largely on the computer vision task at hand and considerations of computational burden.
基于此,池化(也称为下采样和子采样)层通过将输入层矩阵与步长为2的2×2池化矩阵相乘来降低卷积特征的维数,该池化矩阵输出滤波器卷积的扇区中的最大值或平均值,从而实现显著的降维以减轻计算负担。
With this in mind, a pooling (also called downsampling and subsampling) layer reduces the dimensions of a convolved feature by matrix multiplying an input layer matrix by a 2 × 2 pooling matrix of stride 2 which outputs either the maximum value or the average value in the sector that the filter convolves, thereby achieving significant dimensional reduction to reduce the computational burden.
池化基于这样的理念:如果已知某个特定特征在输入矩阵中具有较高的激活水平,那么它的确切位置并不像它相对于其他特征的位置那么重要。由此产生的下采样特征图对于特征在图像中的位置变化更加稳健,因此可以减小其维度;这被称为局部平移不变性。池化还可以帮助提取位置和旋转不变的主导特征,减少噪音,避免过度拟合。
Pooling is based on the idea that if a specific feature is known to be in the input matrix by having a high activation level, its exact position is not as significant as its position relative to other features. The resulting down-sampled feature maps are more robust with regard to changes in the position of the feature in the image, so its dimensions can be reduced; this is called local translation invariance. Pooling may also help to extract positionally and rotationally invariant dominant features, reduce noise, and avoid overfitting.
举一个卷积神经网络及其超参数化的例子,击败李世石的 AlphaGo 策略网络包含 12 个隐藏层的卷积滤波器,使用零填充和步幅1来保持空间维度。网络采用19 × 19 × 48 个输入特征来表示19 × 19 的棋盘。输入层使用5 × 5 × 49 × 192 个滤波器,而隐藏层均使用3 × 3 × 192 × 192 个滤波器,所有层均由 S 型函数或其他灰度函数整流。最后的卷积层是一个1 × 1 × 192 × 1滤波器,每个位置具有不同的偏差,后跟一个softmax函数。价值网络的构造类似,但隐藏层12是一个额外的卷积层,层13是一个1 × 1 × 192 × 1滤波器,层14是一个具有 256 个整流器的全连接层。输出层是一个全连接层,带有一个tanh输出。AlphaGo 总共使用了 192 个滤波器。
For an example of a convolutional neural network and its hyperparameterization, the AlphaGo policy network that defeated Lee Sedol comprises 12 hidden layers of convolution filters with zero padding and stride 1 to maintain the spatial dimension. The network takes 19 × 19 × 48 input features to represent the 19 × 19 board. The input layer uses 5 × 5 × 49 × 192 filters while the hidden layers all use 3 × 3 × 192 × 192 filters, and all the layers are rectified by a sigmoid or other greyscale function. The last convolutional layer is a 1 × 1 × 192 × 1 filter with different biases for each location followed by a softmax function. The value network is similarly constructed, but hidden layer 12 is an additional convolution layer, layer 13 is a 1 × 1 × 192 × 1 filter and layer 14 is a fully connected layer with 256 rectifiers. The output layer is a fully connected layer with a single tanh output. AlphaGo altogether uses 192 filters.
从这个例子中可以清楚地看出,深度卷积神经网络可以是一个极其复杂的结构,体现许多不同的超参数化技术,所有这些技术都需要相当大的计算负担。
From this example, it is clear that a deep convolutional neural network can be an extremely complex structure embodying many different hyperparameterization techniques all of which will require a significant computational burden.
例如,DCNN 用于装配线制造、医疗诊断、Facebook标记照片,当然还有自动驾驶汽车,以及现在和未来的许多其他用途,特别是机器人计算机视觉。
DCNNs for example are used in assembly line manufacturing, medical diagnostics, Facebook to tag photos, and of course by self-driving cars, as well as many other uses now, and in the future particularly for robot computer vision.
乙到三岁时,孩子们已经见过数以百万计的物体,并能学会命名它们,以便对它们进行分类并存储在记忆中。因此,计算机视觉人工智能的首要任务是同样学会识别物体、对它们进行分类并存储在计算机内存中。物体通过它们的特定特征来识别,例如狗的四条腿、黑鼻子、毛皮等,即使物体可能具有截然不同的个体特征,从不同的姿势和角度来看,它们仍然必须被归类为狗;例如,狗属包括小型短毛吉娃娃和巨大的毛茸茸的牧羊犬。然而,它们还可以进一步分类,如哺乳动物的动物分类、家养宠物的社会分类、工作导盲犬、看门狗、搜救犬、猎犬和海关检查犬,仅举几例。然而,猫虽然和狗一样是哺乳动物和家养宠物,但永远不会屈尊为人类工作。
By age three, children have already seen millions of objects, and can learn to name them to classify and store in their memories. Thus the first task of computer vision artificial intelligence is to similarly learn to recognize objects, classify them and store in computer memory. The objects are identified by their particular features, for example the four legs, black nose, fur and so on of dogs, and even though objects may have vastly different individual features, viewed in different poses and aspects, they must still be classified as dogs; for example, the genus dog includes the tiny short-hair Chihuahuas to gigantic shaggy Sheepdogs. However, they are also subject to further classification as in the zoological class of mammals, the societal class of domesticated pets, and working guide, watch, search and rescue, retriever, and customs inspector dogs, just to name a few. Cats however although being mammals and domestic pets just like dogs, will never deign to work for humans.
即使是无生命的物体,比如非常常见的锤子,也可以分为爪锤、圆头锤、横头锤、直头锤、销钉锤、棍棒锤、木槌、细木工锤、软面锤、钉子打孔木雕锤、单板锤、室内装饰锤、大锤、台式锤、电动锤和弹簧锤,所有这些锤子都属于锤属。此外,人类还可以通过工具箱中出现的一小部分(例如爪子的边缘)来识别锤子,并通过工具箱的大小区分羊角锤和撬棍。
Even an inanimate object like the very common hammer a can be separated into the species claw, ball pein, cross pein, straight pein, pin, club, mallet, joiner's mallet, soft-faced, nail-punch woodcarver, veneer, upholstery, sledge-, bench-, power-, and spring-hammers, all belonging to the genus hammer. Humans furthermore can identify the hammer from only a small segment appearing in a toolbox, for instance the edge of the claw, and from the size of the toolbox distinguish the claw hammer from a crowbar.
因此,即使它们属于同一属,物种在外观、特征和效用方面也可能存在很大差异。由此可见,一个数学爱好者可能会认为集合和群的理论可以成为专家系统自上而下的分类算法的基础。然而,这种算法将面临错综复杂的“如果-那么”分支步骤和分类重叠问题。
So even though they belong to the same genus, species can be vastly different as to appearance, character, and utility. From this, one with a mathematical bent might believe that the theory of sets and groups could form the basis of expert system top-down classification algorithms. However, such algorithms would be faced with a tangled web of if-then branching steps and taxonomic overlapping.
计算机视觉从自上而下分类向自下而上分类的转变始于普林斯顿大学的李飞飞意识到,由于婴儿无法天生识别物体和场景,因此算法应该像人类一样通过经验进行学习。机器要想对某事物进行分类,首先必须接触几乎所有事物;也就是说,它需要在非常大的数据集上进行训练,其中包含各种示例以及关于特定分类的详细识别和区分注释;换句话说,计算机视觉需要大数据。
The migration from top-down to bottom-up classification for computer vision began when Princeton's Li Fei-Fei realized that since infants cannot innately recognize objects and scenes, the algorithm should learn through experience just as humans do. For a machine to be able to classify something, it would first have to be exposed to almost everything; that is, it required training on a very large dataset with every kind of example and detailed identification and distinguishing annotations regarding its particular classification; in other words, computer vision needed Big Data.
一个用于计算机视觉识别的对象数据库项目开始了,普林斯顿大学的本科生以每小时 10 美元的价格收集和标记图像,但李教授很快意识到,利用学生的力量进行这个项目至少需要 90 年的时间,并且即使是建立一个基本的数据库也要花费数百万美元。
An object database for computer vision recognition project was begun where Princeton undergraduates collected and labeled images for $10/hour, but Professor Li quickly realized that utilizing student power for the project would take at least 90 years and cost millions of dollars to complete even for a rudimentary database.
就在她感到绝望的时候,一个好主意由于缺乏实际实施而失败了,就在此时,一名研究生告诉她,新的亚马逊 Mechanical Turk众包网站吸引了数以万计的互联网居民在自己的电脑上参与个人项目任务,以赚取额外收入,尽管单位费率很低。
Despairing at the apparent demise of a good idea for want of a practical implementation, just at this time, a graduate student told her about the new Amazon Mechanical Turk crowdsourcing website that attracted tens of thousands of denizens of the Internet Universe to participate in individual project tasks on their own computers to earn extra income, albeit at a very low per unit rate.
李教授立即将 Mechanical Turk 视为一种可扩展的数据收集工具,通过访问这个众包网站,她和她的团队监督了来自 167 个不同国家的约 50,000 名参与者,为她的项目收集、分类、标记和手动注释了近一亿张图像,其中仅猫的图像就有 62,000 张。
Professor Li immediately saw the Mechanical Turk as a data-gathering tool that could scale, and accessing the crowdsourcing website, she and her team supervised some 50,000 participants in 167 different countries to collect, classify, label, and manually annotate nearly one hundred million images for her project, including 62,000 images of cats alone.
她的ImageNet项目耗时两年半才完成,然而在 2009 年宣布时,它被拒绝作为计算机视觉和模式识别(CVPR) 研究报告演讲,并被降级为会议厅一角的一张不起眼的海报展示。1
Her ImageNet project took two and a half years to complete, however when announced in 2009, it was rejected as a Computer Vision and Pattern Recognition (CVPR) research report talk and relegated to a humble poster presentation in a corner of the convention hall.1
李教授前往斯坦福担任人工智能实验室主任,并前往硅谷担任谷歌云首席科学家她试图测试她的大数据方法在计算机视觉方面的有效性,为此她提出了一个竞赛的想法,即在ImageNet 大规模视觉识别挑战赛(ILSVRC)中对来自 1,000 个不同类别的 120 万张ImageNet图像进行计算机视觉机器识别的准确率竞赛。
Moving to Stanford to become the Director of the AI Lab and to Silicon Valley to be the Chief Scientist at Google Cloud, Professor Li sought to test the efficacy of her Big Data approach to computer vision by proposing the idea of a contest for the accuracy of computer vision machine recognition of 1.2 million ImageNet images drawn from 1,000 different categories in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC).
2012 年,GoogleNet的深度卷积神经网络 (DCNN) 被分成两部分并分布在两个 GPU 上,赢得了 ILSVRC 比赛;两年后,其 22 层、9 个 inception 模块的 DCNN 不仅击败了竞争对手的机器,而且在图像识别方面的表现也普遍优于人类。2
In 2012, GoogleNet's deep convolutional neural network (DCNN), split into two parts and partitioned across two GPUs, won the ILSVRC, and two years later, its 22-layer, 9-inception module DCNN defeated not only the competitor machines, but also routinely performed better than human beings at image recognition.2
任何有用的识别都需要如此大量的数据和细致的分类,这让人想起查尔斯·达尔文,他对地球的地质构造、动植物所做的就是分类。
This amount of data and the care of detailing classifications required for any useful recognition brings to mind Charles Darwin, who did just that for Earth's geological formations, flora and fauna.
经过几十年的观察、分析、收集标本、分类、概括、记录,最终形成理论,达尔文在他的自传中哀叹道,3
On the toll of decades of observing, and analyzing, collecting specimens, classifying, generalizing, recording, and ultimately formulate a theory, Darwin in his Autobiography lamented,3
我的大脑似乎变成了一台机器,从大量事实中磨出普遍规律,以至于音乐、文学和欣赏美景都不再让我感到愉悦。
My mind seems to have become a kind of machine for grinding general laws out of large collections of facts such that neither music or literature nor appreciation of fine scenery held any pleasure any longer
事实上,达尔文是深度卷积神经网络算法在人类层面的终极体现,该算法可用于搜索、分类和归纳大量数据。随着深度卷积神经网络识别能力的不断提高,这样的机器可以完成上述所有工作,而无需人类的乐趣,因此学术界可以期待许多达尔文机器人不知疲倦地对动植物进行观察性科学研究,事实上,对任何物质种类也是如此。
Indeed Darwin was the ultimate human manifestation of a deep convolutional neural network algorithm for searching, classifying, and generalizing huge amounts of data. As the recognition capability of DCNNs grows, such a machine can do all the above without want of human pleasures, so academia can look forward to many Darwinian robots tirelessly performing observational science on flora and fauna, and indeed on any genus of matter.
达尔文于 1859 年出版的划时代著作《物种起源》阐述了归纳、演绎和概括的理论,创立了进化论,这是人类最伟大的科学成就之一。
Darwin's epochal 1859 book Origin of the Species set forth the induction, deduction, and generalization that established the Theory of Evolution, one of the greatest scientific achievements of mankind.
达尔文首先根据自然选择理论建立了动植物进化的分支模式;后来他转向人类进化论的两本书,《人类的由来》(1871 年)和《人类与动物的情感表达》(1872 年)。因此,他将生物对象的分类及其情感表达(情感知识)融合在一起。4
Darwin first established the branching patterns of flora and fauna evolution based on his theory of natural selection; he later turned to human evolution in two books, The Descent of Man (1871) and The Expression of Emotion in Man and Animals (1872). He thus amalgamated the classification of biological objects and their emotive expression (emotional knowledge).4
不同国籍的人天生或后天培养的特定特征,例如,德国人和日本人相对保守,而意大利人和尼日利亚人则更为开朗,这也可以形成一个类别(尽管存在刻板印象的危险),以获得更深入和更广泛的认可。意大利谈判代表认为她已经与一家日本公司达成了交易,因为该公司的代表们都点头表示同意她的提议,但实际上他们的反应只不过是礼貌地表示他们听到了她所说的话。
Particular traits either innate or nurtured of different nationalities, for example relatively reserved Germans and Japanese compared to more expansive Italians and Nigerians could also form a class (although fraught with the dangers of stereotyping) for deeper and more extended recognition. Woe to the Italian negotiator who believes she has closed the deal with a Japanese company because its representatives all nodded in seeming assent to her proposal, their response actually being no more than polite acknowledgment that they had heard what she was saying.
不同文化中表情的巨大差异需要大量极其细微的面部表情和肢体语言数据,所有这些数据都要经过分析和细致的注释,但仍然充满模糊性。
The enormity of variations of expression in diverse cultures would require huge amounts of extremely subtle facial expression and body language data, all analyzed and minutely annotated yet still fraught with ambiguity.
人类的视觉皮层有 1.4 亿个神经元和数十亿个连接;成年后,人就可以轻松识别不同外观和环境下的数十亿个物体和场景。
The human visual cortex has 140 million neurons and billions of connections; by adulthood a person can easily recognize billions of objects and scenes in many different guises and settings.
为了实现这一终极能力,人类大脑在运作上拥有负责思考、情感和行为的额叶;负责运动的运动皮层;负责感觉(包括视觉化)的感觉皮层;负责知觉和数学的顶叶;负责记忆和语言的颞叶;以及负责平衡和协调的小脑,如下图所示。5
To achieve this ultimate capability, the human brain operationally has a frontal lobe for thinking, emotion, and behavior; the motor cortex for movement; the sensory cortex for sensation including visualization; the parietal lobe for perception and mathematics; the temporal lobe for memory and language; and the cerebellum for balance and coordination, as shown in the schematic figure below.5
人工智能是否可以模仿这些生物结构,并且像人类一样具有感知能力,并能够解决其中的模糊性?
Can artificial intelligence mimic these biological constructs and be as perceptive as humans are, and be able to resolve the ambiguities?
更深层次的人工神经网络和更大的训练集可以产生更准确的图像识别、分类和推理,但 DCNN 还可以通过扩充数据来形成更广泛的组,包括其他图像表示,同时保持图像标签不变,从而提高其辨别范围。通过模拟生物大脑的顶叶来识别旋转和平移不变性、片段识别和关联组扩展,以及简单的图像像素移位、水平和垂直翻转、随机裁剪、颜色抖动、平移、旋转等,原始特征得以保留,但类别扩大了数倍,以包含相同的变体,而无需新数据。
A deeper artificial neural network and larger training sets can produce more accurate image recognition, classification, and inference, but DCNNs can also improve their range of discernment by augmenting data to form a broader group by including other image representations while keeping the image label the same. By emulation of the biological brain's parietal lobe for recognition of rotational and translational invariance, segment identification and association group expansion, together with simple image pixel shifting, horizontal and vertical flips, random crops, color jitters, translations, rotations, and so on, the original feature characteristic is maintained, but the class is expanded several-fold to include variations of the same, without the need for fresh data.
如果将一个已经调整好参数的训练网络与一个具有相关目标的新网络连接起来,新网络就可以使用输入到新网络的相关新数据对预先训练的网络进行“微调”,以增强特定的识别能力。
If a trained network with already tuned parameters is concatenated to a new network with relevant objectives, the new network can “fine-tune” the pre-trained network with only relevant new data that is fed to the new network to enhance a specific recognition capability.
这种所谓的迁移学习是通过冻结训练网络所有层的所有梯度下降参数、移除全连接层并将其替换为新训练网络的输入层,然后使用与当前任务更相关的数据继续训练新网络来实现的。通过这种方式,预训练网络已经提取的特征不必由级联网络重新识别,只需将它们迁移到新网络,级联网络就可以被教授新的、更具体、更精细的数据。6
This so-called transfer learning is implemented by freezing all the gradient descent parameters of all the layers of the trained network, removing the fully connected layer, and replacing it with the input layer of the new training network, and then proceeding with training of the new network with data more relevant to the task at hand. In this way, the features already extracted by the pre-trained network do not have to be newly identified by the concatenated network, they just are transferred to the new network and the concatenated network can be taught new, more specific, and more subtle data.6
电视冯·诺依曼计算机体系结构是串行的;中央处理器 (CPU) 处理通过连接到中央处理器 (CPU) 的总线从内存存储单元获取的程序和数据。每条信息都分配有一个具有唯一地址的内存位置,并在获取后按照时钟的步调按指令周期顺序处理。由于所有获取的信息共享总线,因此 CPU 必须始终等待指令和数据才能继续,这会大大减慢操作速度。
The von Neumann computer architecture is serial; a central processing unit (CPU) processes the programs and data that are fetched from a memory storage unit through a bus connected to the central processing unit (CPU). Each piece of information is assigned a memory location with a unique address, and after fetching, is processed sequentially by instruction cycles in step with a timing clock. Because all fetched information shares the bus, the CPU must always wait for instructions and data before it can proceed, which can considerably slow down operations.
人们已经采用了许多加速技术来减少延迟(程序之间的时间),包括添加输入/输出处理器、将内存划分为存储体、安装快速数据缓存、添加协处理器以更快地执行一些较慢的功能、用于多路复用操作的流水线以及多个 CPU 核心。
Many speed-up techniques have been employed to decrease the latency (time between procedures), including adding an input/output processor, partitioning memory into banks, installing fast data caches, adding a coprocessor to perform some slower functions faster, pipelining for multiplex operation, and multiple CPU cores.
有了这些附加功能,冯·诺依曼机器可以使用如下图所示的多核串行架构来处理当今大多数的常规计算任务,其中 LN 指定层,LLC 是最后一层缓存并在各个核心之间共享,如果数据不在缓存中,则会从 DDR-4 内存中获取。
With these additions, von Neumann machines can handle most of today's routine computing tasks using multi-core serial architecture organized as in shown in the figure below, where the LN's designate layers, LLC is the last layer cache and is shared among the cores, and if the data is not in the caches, it will be fetched from the DDR-4 memory.
多核自然而然地引发了矢量架构的想法,其中处理器将信息视为数据元素的矢量而不是单个标量数据点,并且不是单核或多核处理器,而是处理器阵列。
The multiple cores naturally led to the idea of a vector architecture where the processor treats information as a vector of data elements instead of individual scalar data points, and instead of a single or multi-core processor, an array of processors.
机器学习对非常大的数据集的巨大需求意味着处理速度是一个至关重要的问题,尽管取得了进展,但串行冯诺依曼瓶颈严重限制了人工智能机器的性能。
Machine learning's voracious appetite for very large data sets meant processing speed was a paramount concern, and although advances were made, the serial von Neumann bottleneck seriously limited artificial intelligence machine performance.
虽然某些操作需要知道一个步骤的结果才能处理下一步,本质上是串行的,例如传统的加密算法,但其他操作(例如计算机游戏中的光线追踪和机器学习的矩阵运算)可以通过同时获取和并行运行来执行。
While some operations which require knowing a result of one step in order to process the next step are inherently serial, for example conventional cryptography algorithms, other operations such as ray tracing in computer games and the matrix operations of machine learning can be performed by simultaneously fetching and running in parallel.
最初设计用于计算机游戏的图形图像,其中像素阵列(光栅)定义图像,图像动画是图像属性(包括纹理和阴影)随时间的变化。光线追踪通过追踪光线穿过图像平面中像素的路径来生成图像,从而根据不同的光线路径产生 3D 对象的逼真的 2D 再现;像素存储在矩阵中,由多个核心 CPU 组成的图形处理单元(GPU) 并行执行计算,如下图所示。1
Originally designed for the graphical images of computer games where an array of pixels (a raster) defines the images, and animation of the images is the change in the image attributes (including textures and shading) with time. Ray tracing generates an image by tracing the path of light through the pixels in an image plane producing realistic 2D renditions of 3D objects from the different ray paths; the pixels are stored in a matrix and a graphical processing unit (GPU) comprising multiple core CPUs perform the calculations in parallel, as shown schematically in the figure below.1
每个处理器集群 ( PCle ) 都有多个流式多处理器 ( SM ),每个 SM 都有一个第 1 层指令缓存层。一个 SM 将从专用的第 1 层缓存和共享的第 2 层缓存中获取数据,然后再使用全局 GDDR-5 内存;GPU 中的缓存层通常较少且较小,因为只要处理稳定进行,GPU 就不太关心内存延迟。通过这种方式,GPU 并行处理架构可以明显提高计算机吞吐量。2
Each processor cluster (PCle) has multiple streaming multiprocessors (SM) and each SM has a layer-1 instruction cache layer. One SM will fetch from a dedicated layer-1 cache and a shared layer-2 cache before using the global GDDR-5 Memory; the cache layers in GPUs are generally fewer and smaller as GPUs are less concerned with memory latency as long as the processing is going on steadily. In this way, the GPU parallel-processing architecture can clearly increase computer throughput.2
查看下图所示的第 13 章中的示例人工神经网络结构,很容易看出为什么 GPU 的并行处理优于串行架构 CPU,因为每个隐藏层的激活级别变化可以由单个图形核心处理器并行处理。
Looking at the exemplary artificial neural network structure from Chapter 13 shown in the figure below, it is easy to see why parallel processing by GPUs would be superior to serial architecture CPUs, as each hidden layer's activation level change could be handled in parallel by a single graphics core processor.
典型的深度人工神经网络可以处理 30 GB 的元素,并拥有数百万个节点。利用 GPU 代替 CPU 可以将节点数量减少两个数量级。
A typical deep artificial neural network could handle 30 GBytes of elements and has millions of nodes. Utilizing GPUs instead of CPUs can reduce the number of nodes by two orders of magnitude.
在并行处理中,平均并行度定义为工作量(即操作总数)除以深度(即网络中的层数),得出计算元素的平均数量,
In parallel-processing, the average parallelism is defined as the Work, which is just the total number of operations, divided by the Depth, which is the number of layers in the network, and gives the average number of computing elements,
由于数据量巨大,深度学习数据处理需要大量的元素操作,因此小批量是按顺序处理的。采用小批量也会提高准确性,因为噪音会在小批量上平均被过滤掉。找到适合小批量的大小只是一种练习,看看什么最适合手头的特定问题。3
Because of the huge amounts of data, deep learning data processing requires a prodigious number of element operations, so mini-batches are processed in turn. Taking mini-batches will also promote greater accuracy because noise will be filtered out on average over the mini-batches. Finding the right size for the mini-batches is an exercise in just seeing what works best for the particular problem at hand.3
显然,通过并行处理可以加快计算的速度应该有一个极限;它由阿姆达尔定律给出,
It should be evident that there should be a limit to how much a computation can be speeded up by parallel processing; it is given by Amdahl's Law,
其中P是系统中可以并行处理的比例(1 – P是系统中保持串行处理的比例),N是处理器的数量。由于处理器数量N → ∞,S = 1/(1 – P),因此SpeedUp最终取决于系统中无法并行处理的比例;这个比例永远不可能为零,因为有些操作不可避免地依赖于之前的计算。
where P is the proportion of the system that can be made parallel (and 1 – P is the proportion of the system that remains serial) and N is the number of processors. As the number of processors N → ∞, S = 1/(1 – P), so the SpeedUp ultimately depends on the proportion of the system that cannot be parallel processed; this can never be zero as it is unavoidable that some operations depend on computations that came before.
赢得 2014 年 ImageNet 大规模视觉识别挑战赛的深度卷积神经网络GoogLeNet拥有 680 万个参数,深度为 22 层,运行在 Nvidia Tesla V100 上,该处理器拥有 80 个流式微处理器,每个微处理器有 64 个核心。运行 Tesla 的软件包括VMWare VSphere ESXI ,它使用DirectPath I/O将 GPU 专用于虚拟机。
The deep convolutional neural network GoogLeNet that won the 2014 ImageNet Large-Scale Visual Recognition Challenge has 6.8 million parameters and is 22 layers deep, running on the Nvidia Tesla V100 with 80 Streaming Microprocessors each with 64 cores. Software to run the Teslas includes VMWare VSphere ESXI that dedicates a GPU to a Virtual Machine using DirectPath I/O.
谷歌的张量处理单元(TPU) AI 加速器 ASIC 在架构上与 GPU 相同,但不执行图形光栅化或纹理化;它专为使用TensorFlow软件进行机器学习神经网络操作而设计,通常用于高容量、低精度处理,例如 GooglePhotos、RankBrain 搜索、计算机视觉,以及轰动一时的 AlphaGo 与李世石的比赛。
Google's tensor processing unit (TPU) AI accelerator ASIC is architecturally the same as a GPU but does not perform the graphical rasterization or texturization; it is specifically designed for machine learning neural network operations using TensorFlow software mundanely in high volume, low precision processing such as GooglePhotos, RankBrain search, computer vision, and sensationally in AlphaGo's match with Lee Sedol.
TPU从一块采用28nm工艺制造的简单的8位矩阵乘法器芯片,发展到增加600GB内存带宽、每个性能达到45teraFLOPS、排列在四个芯片模块中,然后每个模块使用16个芯片,将处理能力翻倍。
The TPU has progressed from a simple 8-bit matrix multiplier chip manufactured in a 28 nm process, to adding 600 GB memory bandwidth with performance each reaching 45 teraFLOPS arranged in four-chip modules, and thereupon doubling the processing power using 16 chips per module.
2020 年左右,全球最大的 AI 芯片是 Cerebras Wafer Scale Engine (WSE),它拥有 1.2 万亿个晶体管、400,000 个内核、距离内核一个时钟周期的 18 GB 片上内存以及 100 拍比特/秒的内存带宽(从内存读取和存储数据的速率),构成了专为机器学习设计的低延迟、高带宽输入/反馈芯片。WSE 比 Nvidia 最大的 GPU 大 56 倍,内存大 3000 倍,内存带宽是其 10,000 倍,据称它将把互连传输延迟从几周缩短到几分钟,并通过大量计算内核和内存来提高处理速度,理想情况下为数据密集型机器学习提供分布式计算能力。4
The world's biggest AI chip circa 2020 is the Cerebras Wafer Scale Engine (WSE) with 1.2 trillion transistors, 400,000 cores, 18 GB on-chip memory one clock-cycle away from the cores, and 100 petabits/second memory bandwidth (rate at which data read from and stored to memory) comprising a low-latency, high bandwidth input/feedback chip specifically designed for machine learning. The WSE is 56 times larger, has 3000 times more memory, and 10,000 times the memory bandwidth of Nvidia's largest GPU, and purportedly will reduce the interconnection transmissions latency from weeks to minutes, and increase processing speed with massive numbers of computation cores alongside memory, ideally providing distributed computing power for more data-intensive machine learning.4
高性能计算竞赛并不是什么新鲜事。多年来,美国和日本一直占据着世界最快计算机的头衔,但最近中国的神威和天和夺得了这一头衔,但在这场事关国家声誉的大国技术领导权竞赛中,美国在 2020 年重新夺回领先地位,而富士通的富岳在 2021 年以 442 petaFLOPS 的机器超越了它。可以预见,这三个计算机强国之间的较量将持续到未来很长一段时间。
High performance computing competitions are nothing new. For years America or Japan held the title for world's fastest computer, but recently China's Sunway and Tianhe claimed the title, but in this big power technology leadership race where national prestige at stake, America had retaken the lead in 2020 only to see Fujitsu's Fugaku in 2021 surpass it with a 442 petaFLOPS machine. The back and forth among the three computer powers can be expected to continue well into the future.
位于田纳西州橡树岭国家实验室的 Summit 超级计算机由 IBM 和 Nvidia/Mellanox 联合为美国能源部建造,是一个占地面积相当于两个篮球场的并行处理系统,采用 9216 个 IBM Power9 CPU、27,648 个 Nvidia Tesla V100s GPU、241 万个核心和 250 PB 内存,速度为 148.6 千万亿次浮点运算(峰值 200 千万亿次浮点运算)。它目前用于宇宙学模拟、气候建模和医学研究。
The Summit supercomputer located at Oak Ridge National Laboratory in Tennessee was built by a consortium of IBM and Nvidia/Mellanox for the US Department of Energy; it is a parallel processing system covering the size of two basketball courts, employing 9216 IBM Power9 CPUs, 27,648 Nvidia Tesla V100s GPUs, 2.41 million cores, and 250 petabytes (PB) of memory with a speed of 148.6 petaflops (peak 200 petaflops). It is currently used for cosmology simulations, climate modeling, and medical research.
Summit 的姊妹超级计算机 Sierra 是为美国国家核安全局建造的,位于加利福尼亚州的劳伦斯利弗莫尔国家实验室。它的设计与 Summit 类似,拥有 8640 个 CPU、17,280 个 GPU 和 1.38 PB 内存,峰值速度为 125 千万亿次浮点运算。它目前几乎专门用于核武器模拟,因此其工作显然属于高度机密。
Summit's sister supercomputer the Sierra was built for the National Nuclear Security Administration, and is located at Lawrence Livermore National Laboratory in California. With a design similar to the Summit, it has 8640 CPUs, 17,280 GPUs, and 1.38 PB of memory with a peak speed of 125 petaflops. It is currently used almost exclusively for nuclear weapons simulations and its efforts therefore understandably highly classified.
日本的富岳位于神户,由富士通和日本国家研究机构理化学研究所共同建造,奇怪的是,它并没有使用 GPU 来提高速度,而是使用 158,976 个 ARM 48 核片上系统 (SOC) 来进行汽车碰撞、大数据处理和 covid-19 蛋白质折叠的人工智能分析。5
Japan's Fugaku located in Kobe, was built by Fujitsu and Japan's National Research Institute Riken, and strangely does not use GPUs for speed but rather 158,976 ARM 48-core system on chips (SOC) for the artificial intelligence analysis of automobile collisions, Big Data processing, and covid-19 protein-folding.5
除了单纯地扩大组件规模以产生更强大的计算能力之外,随着量子计算机速度越来越快、带宽几乎无限,量子力学波函数叠加和纠缠的深奥物理学也进入了人工智能领域。
In addition to the purely scaling up of components to produce greater computational power, the esoteric physics of quantum mechanical wavefunction superposition and entanglement has also entered the artificial intelligence fray with the ever-faster and almost infinite bandwidth of quantum computers.
当量子计算机由n 个量子比特驱动时,这些量子比特可以同时并行处理经典1和0比特之间2 n 个可能的叠加态,量子计算机将一个过程的波函数坍缩为n位的可观测概率态。通过这种方式,几乎无限数量的纠缠不可观测量子态在坍缩为可观测态的过程中可以被处理或使用量子加密传送给其他量子计算机;由于无法确切知道观察前状态的纠缠过程,因此该加密在理论上不可能被破译。例如,只有n = 50 个量子比特意味着2 50 = 10 15个可能的状态;由此可见,超大规模并行处理量子计算机的潜力是显而易见的。
When driven by n qubits that can parallel process 2n possible superposition states between the classical 1 and 0 bits simultaneously, the quantum computer collapses the wavefunction of a process to observable probabilistic states of n bits. In this way, an almost unlimited number of entangled unobservable quantum states on the way to collapse into observables may be processed or communicated to other quantum computers using quantum encryption; that encryption would be theoretically impossible to decipher because of the process of entanglement of states before observation cannot be known with certainty. For example only n = 50 qubits means 250 = 1015 possible states; from this the potential for supermassive parallel-processing quantum computers is clear.
通过联网高n量子计算机阵列,巨大的计算能力可用于化学过程预测、材料科学模拟、纯数学以及人工智能的非常深的人工神经网络矩阵计算。
By networking an array of high-n quantum computers, tremendous computational power can be used in chemical process prediction, material science simulations, pure mathematics, and the very deep artificial neural network matrix computations of artificial intelligence.
量子力学的运算依赖于表示和存储大型复杂张量(标量、向量和矩阵);对张量执行线性代数运算需要大量神经网络内存存储和处理能力。
The operations of quantum mechanics depend on representing and storing large complex tensors (scalars, vectors, and matrices); performing linear algebra operations on the tensors requires exponential amounts of neural network memory storage and processing power.
使用量子比特神经元的量子感知器模型可以通过在量子硬件上使用N 个量子比特(m = 2 N)对m维输入进行编码和参数化,充分利用量子计算机巨大的量子信息存储能力。6
A quantum perceptron model using qubit neurons can exploit the huge quantum information storage capability of quantum computers by encoding an m-dimensional input and parameterization on quantum hardware using N qubits (m = 2N).6
2019年9月,谷歌和美国宇航局宣布超导金属54-1量子比特量子处理器Sycamore实现“量子霸权” ,仅用200秒就解决了随机电路采样问题,而目前世界上最快的超级计算机Summit需要数十亿年才能完成的计算。7
In September 2019, Google and NASA announced the attainment of “quantum supremacy” by the superconducting metal 54-1qubit quantum processor Sycamore that solved the random circuit sampling problem in just 200 seconds, a calculation that the present world's fastest supercomputer Summit would take billions of years to complete.7
2020 年 12 月,合肥中国科学技术大学宣布,其九常光学光子处理器也实现了量子霸权,在 3 分钟内完成了高斯玻色子采样(GBS),而神威太湖超级计算机则需要 20 亿年才能完成。8
In December 2020, the China University of Science and Technology in Hefei announced that their Jiuchang optical photon processor also achieved quantum supremacy by performing Gaussian Boson Sampling (GBS) in 3 minutes, what the Sunway Taihu supercomputer would take 2 billion years to compute.8
然而,选择用来证明量子霸权的问题却是量子计算本身所特有的,这使得量子计算速度的“证明”有些牵强。尽管如此,2大量同时并行计算的潜力预示着人工智能计算的未来前景光明,尽管前景充满威胁。
The problems chosen to demonstrate quantum supremacy, however, are endemic to quantum computing itself, rendering “proofs” of quantum computing speeds somewhat contrived. Nevertheless, the potential of 2high number simultaneous parallel computations augers well for a bright albeit minatory vision of the future of artificial intelligence computation.
磷预测分析通过推断数据中的相关性,根据过去的行为预测未来的表现。例如,如果一位 NBA 老将的三分球命中率平均为 35%,并且在比赛的上半场投中了全部六次三分球,对方教练就会告诉他惊慌失措的球队,“别担心,他不能再这样下去了!”
Predictive analytics deduces correlations in data to predict future performance from past behavior. For example, if a veteran NBA player with a lifetime 3-point shooting average of 35% has made all of his six 3-point shots in the first half of a game, the opposing coach will tell his shell-shocked team, “Don’t worry, he can’t keep that up!”
如果教练的意思是,由于射手已经连续命中六次,因此他更有可能在下一次投篮中失败,那么他就是陷入了赌徒谬误,因为任何给定的三分球都是随机的,这意味着投篮成功并不取决于前一次投篮的成败;如果教练的意思是射手会在下半场冷静下来,那么他的准确度只是稍微提高一点,但如果他的意思是射手会在整个 NBA 赛季或职业生涯的剩余时间内回归平均水平,那么根据大数定律,他的评估是正确的。
If the coach means that because the shooter has made six straight shots, he more likely will miss the next shot, he is engaging in the gambler's fallacy, for any given 3-point shot is stochastic, meaning that shot success does not depend on the success or failure of the previous shot; if the coach means that the shooter will cool off in the second half, he is only slightly more accurate, but if he means the shooter will regress to the mean over the whole NBA season or for the rest of his career, he is correct in his assessment by the law of large numbers.
也就是说,射手可能在整场比赛中保持火力全开,尽管这场比赛对对方教练来说并没有什么安慰作用,但以 90% 范围内的标准差来衡量置信区间,这位经验丰富的 35% 射手在整个赛季的更大样本量中的表现永远不会超过其平均水平的标准差,这意味着无论多么伟大的球员,都无法逃脱伯努利定律的倒退魔爪。1
That is, the shooter may remain on fire throughout the game, and although of scant comfort to the opposing coach of this game, with a confidence interval as measured by the standard deviation in the 90% range, the veteran 35% shooter will never perform in the larger sample size of whole seasons at greater than a standard deviation from his average, meaning that no player, however great, can escape the regressive clutches of Bernoulli's law.1
更实际的是,预测分析可用于预测制造和运输中的硬件和车辆故障、个人健康问题、流行病以及任何具有因果关系的相关性,因为“没有相关性就没有因果关系”,但当然“相关性并不意味着因果关系,它可能只是巧合”,而对于技术专家来说巧合只是噪音。
More practically, predictive analytics is useful for predicting hardware and vehicle breakdowns in manufacturing and transportation, personal health problems, epidemics, and any correlation with a causal relationship because “there is no causation without correlation”, but of course “correlation does not imply causation, it might be just coincidence”, and coincidence to the technologist is just noise.
在一场重要的比赛中,球星的发挥、角色球员的贡献、教练的战略战术固然是影响比赛结果的因素,但很多球员、教练、老板也认为,球队下榻的酒店、球员赛前穿的衣服、球鞋的颜色、赛前耳机里播放的歌曲,以及各种迷信观念等,也是影响比赛结果的因素。
In an important game, the stars’ performances, role players’ contributions, the coach's strategies and tactics certainly are factors in the game's outcome, but many players, coaches, and owners also believe that the hotels where the teams stay, what the players wear before the game, the color of their sneakers, the songs in their pregame music headphones, and all manner of superstition are also factors in the game's outcome.
他们相信存在相关性,因为他们注意到了一些巧合,而我们当中的其他人却不这么认为,即使他们知道这些巧合是没有根据的。这里的问题是如何将重要的因素与不重要的“噪音”区分开来,并发现主要的相关性作为原因。这可以通过在很长一段时间内收集大量数据样本并让大数定律给出答案来实现。
They believe that there are correlations because of some coincidences they have noticed, and who among us have not had the same beliefs, even in the knowledge that they are unwarranted. The problem here is to separate the significant factors from the insignificant “noise” and discover the principal correlations as causes. This can be done by collecting large samples of data over long time periods and allow the law of large numbers give you your answer.
从这个意义上说,预测分析的真正价值不在于简单的相关性,例如夏季结束和学校用品销售额的增长,这些相关性在多年的零售统计数据中显而易见并得到证明,而在于从统计数据中发现潜在因素和推论。
In this sense, the real value of predictive analytics lies not in simple correlations such as the end of Summer and the increased sales of school supplies, which are readily apparent and demonstrated by years of retail statistics, but in discovering latent factors and inferences from the statistical data.
最简单的回归模型是拟合一条直线,该直线最好地平分数据点,如图所示,一个数据集有一个独立变量x和一个因变量y,在由直线方程定义的线性回归模型中,
The simplest regression model is to fit a straight line that best bisects the data points as shown in the figure, a data set with one independent variable x, and one dependent variable y, in a linear regression model defined by the equation of a straight line,
其中 m 为直线斜率,b 为 y 轴截距。
where m the slope of the straight line, and b the y-axis intercept.
一旦根据数据确定了斜率和 y 截距,那么从任何新的独立变量x的输入,就可以预测y 的输出,对于x的值,不一定完全准确,但该值将等于许多x输入的平均值。
Once the slope and y-intercept are determined from the data, from any input of a new independent variable x, the y output can be predicted, not necessarily completely accurately for that value of x, but a value that will equal an average over many x inputs.
线性回归模型还可以通过数据点与直线的方差来帮助量化独立变量和因变量之间关系的强度;也就是说,如果几乎所有的点都靠近直线(如上图所示),则表示依赖性较强,而如果这些点随机分散,距离直线相对较远,则表示依赖性较弱。
A linear regression model can also help to quantify the strength of the relationship between the independent and dependent variable by the variance of data points from the straight line; that is, if almost all the points are close to the line (as in the figure above), a strong dependence is indicated, and if the points are randomly scattered relatively far away from the line, a weak dependence is indicated.
线性回归通常使用最小二乘法,该方法最小化方差(即误差平方和),以找到斜率和y截距。首先从上面的数据图中计算x值的平均值和y值的平均值,
Linear regression typically uses the least squares method, which minimizes the variance which is the sum of squares of the errors, to find the slope and y-intercept. This is done by first calculating the mean of the x values and the mean of the y values from the data plot above,
找到斜率m,
finding the slope m from,
以及y轴截距,
and the y-intercept from,
如果相关性取决于更多独立(输入)变量,则多元线性回归(也称为多变量线性回归)会针对特定的因变量y (输出)产生多个输入变量x i的线性和,每个输入变量都由一个回归系数 β i乘以该回归系数,该回归系数对该变量的重要性进行加权,以及残差项ε,其分布函数用于调整β i的值,
If there are more correlations depending on more independent (input) variables, multiple linear regression (also called multivariable linear regression), produces for a particular dependent variable y (output), a linear sum of the multiple input variables xi each of which is factored by a regression coefficient βi that weights the significance of that variable, and a residual error term ε which distribution function is used to adjust the values of the βi,
对于更复杂的相关性,也可以使用独立变量x i的更高次方(例如x 2和x 3)或独立变量的相互作用(例如x 1 x 2)甚至独立变量的基本函数(例如sin(x)和cos(x) )来拟合数据。但是,将回归曲线拟合得太接近数据点可能会导致过度拟合,并导致预测模型的泛化能力丧失。2
For more complicated correlations, higher powers of the independent variables xi, such as x2 and x3, or the interaction of the independent variables, such as x1x2, or even elementary functions of the independent variables, such as sin(x) and cos(x), can also be used to fit the data. However, fitting regression curves too closely to the data points can lead to overfitting and a loss of a predictive model's generalization capability.2
如果对k + 1 个独立变量有n 个观测值,则对于第i个因变量,
If there are n observations on the k + 1 independent variables, then for the ith dependent variable,
这可以以紧凑向量/矩阵形式写成一般线性模型,
This can be written in compact vector/matrix form as a general linear model,
其中y是包含y i个元素的列向量,X是排列独立变量x ij 的矩阵,β是包含回归系数β i 的列向量,ε是包含误差项ε i 的列向量。
where y is a column vector holding the yi elements, X is a matrix arraying the independent variables xij, β is a column vector holding the regression coefficients βi, and ε is a column vector holding the error terms εi.
例如,一家公司希望优化将软饮料送到自动售货机所花费的时间;独立变量是 (1) 库存瓶数 ( x 1 ) 和 (2) 送货员每次将饮料送到机器的距离 ( x 2 )。使用统计软件包(如 SPSS),绘图命令可以生成散点图矩阵,以评估数据之间是否存在线性关系。如果存在,则可以使用现成的联立方程组求解器软件,通过高斯消元法(通常通过矩阵的行列式)计算回归系数β i和误差ε i。交付时间与自动售货机数量的线性关系然后可以在拟合的回归模型方程中找到瓶子和行驶距离,并计算回归系数,例如,
For example, a company wants to optimize the time expended in the delivery of soft drinks to vending machines; the independent variables are (1) how many bottles to stock (x1) and (2) distance driven by the deliverer (x2) in each run to the machines. Employing statistics software packages such as SPSS, a plot command to produce a matrix of scatter plots can assess if there are linear relationships among the data. If there are, then the regression coefficients βi and error εi can be calculated by Gaussian elimination (typically by means of the determinant of the matrices) using canned systems of simultaneous equations solver software. The linear relation of delivery time as a function of number of bottles and driving distance then can be found in a fitted regression model equation with calculated regression coefficients such as,
其中y est是预计的配送时间,可以看出,配送时间对瓶子数量x 1的依赖性远大于行驶距离x 2 . 3
where yest is the estimated delivery time, and it can be seen that the delivery time dependence on the number of bottles x1 is much greater than the driving distance x2.3
假设通过减少每台机器的存瓶数量并在一次配送过程中使用更多机器来优化流程,如果经过多次配送后,公司的整体配送时间没有显著改善,那么可能存在隐藏的相关性,例如每台自动售货机的补充百分比要求,以及每个品牌软饮料配送的瓶子数量。然后可以将这些因素纳入分析,以查看整体配送时间是否有所减少,如果是,则应将这种相关性作为新的独立变量纳入模型中。
After presumably optimizing the process by reducing the number of bottles stocked per machine and driving to more machines in a delivery run, if after many runs there is no significant improvement in the company's overall delivery time, then there may be hidden correlations at work, such as the refill percentage requirement of each vending machine, and the number of bottles delivered for each brand of soft drink. These factors then can be included in the analysis to see it there is any decrease in the overall delivery time, and if so, this correlation should be included in the model as new independent variables.
一般线性模型可以扩展,以处理多元线性回归模型中相互依赖的多重相关性的影响,
The general linear model can be expanded to handle the effects of interdependent multiple correlations in a multivariate linear regression model,
其中,因变量Y是一个矩阵,每列包含一行对每个因变量y的估计值,这些估计值是加权独立变量的函数;独立变量X是一个矩阵,每列包含对其中一个独立变量x 的一组观测值,该观测值是其他独立变量的函数;B是一个需要调整以拟合数据的参数矩阵;E是一个误差(噪声)矩阵,假设该矩阵在各个观测值之间不相关,并服从多元正态分布。
where the dependent variable Y is a matrix with each column having a row of estimations of each of the dependent variables y as functions of the weighted independent variables, and the independent variable X is a matrix with each column being a set of observations on one of the independent variables x which is a function of the other independent variables, B is a matrix of parameters to be adjusted for fitting the data, and E is an error (noise) matrix that is assumed to be uncorrelated across observations and follows a multivariate normal distribution.
例如,医学研究人员收集了可测量的独立健康变量数据:一组人群的体重、血压和胆固醇水平,以及该组人群每周食用的红肉、鱼、牛奶和酒精的进一步数据。多元线性回归模型可以确定每个健康和饮食独立变量之间的相互关联,并且假设错误(例如低酒精消费的错误报告)遵循正态分布。
For an example, a medical researcher collects data on the measurable independent health variables: weight, blood pressure, and cholesterol level of a cohort population, and further data on red meat, fish, milk, and alcohol consumed by the cohort per week. A multivariate linear regression model can determine the interrelated correlations among each of the health and diet independent variables, and the errors (such as false reports of low alcohol consumption) are assumed to follow a normal distribution.
简单线性、多元线性和多元线性回归可以分别通过标量、向量和矩阵表示来区分,如上式所示,并且项的系数将揭示相互关系。
Simple linear, multiple linear, and multivariate linear regressions can be distinguished by the scalar, vector, and matrix representations respectively as shown in the equations above, and the coefficients of the terms will reveal the inter-relationships.
当然,参数计算、假设检验、模型分析以及从中间步骤中提取更多信息可能会变得相当复杂,并且是正在进行的研究的主题。
Of course, parameter calculations, hypothesis testing, analysis of the models and the extraction of further information from intermediate steps can become quite complex, and are the subjects of on-going research.
线性回归广泛应用于科学和工程数据分析,以及生物、医学、行为、经济和社会科学中的预测分析。常见的应用是趋势估计,其中曲线运动数据可以表示趋势,例如在流行病学中,线性回归模型发现吸烟独立变量与吸烟者寿命因变量之间存在直接负相关性。在金融领域,资本资产定价模型使用线性回归和贝塔系数(股票的波动性是否大于或小于整个市场)来量化投资的系统性风险。在经济学中,线性回归广泛应用于从经济衰退到通货膨胀的几乎所有预测领域。而在人工智能中,线性回归是监督机器学习中使用的基本学习算法之一。
Linear regression is widely used in science and engineering data analysis, and in the biological, medical, behavioral, economic, and social sciences in predictive analytics. Common applications are trend estimation where curve movement data can represent a trend, for example in epidemiology, a linear regression model found a direct negative correlation between a cigarette smoking independent variable and a smoker's lifespan dependent variable. In finance, the capital asset pricing model uses linear regression and beta (whether the stock is more or less volatile than the market as a whole) to quantify the systematic risk of an investment. In economics, linear regression is widely used in almost all areas of prediction from economic downturns to inflation. And in artificial intelligence, linear regression is one of the fundamental learning algorithms used in supervised machine learning.
一个 受限玻尔兹曼机(RBM) 是一种早期的人工神经网络,只有一个由向量 v组成的输入(可见)层、一个由向量 h组成的隐藏层,没有输出层。层内的人工神经元之间没有节点连接(因此名称中带有“受限”形容词)。RBM 根据分布的自由能学习概率分布(一次学习多种不同可能性的概率),以确定非结构化输入数据的真实概率分布。
A Restricted Boltzmann Machine (RBM) is an early artificial neural network with only an input (visible) layer composed of vectors v, one hidden layer composed of vectors h, and no output layer. There are no node connections among the artificial neurons within the layers (hence the “restricted” adjective in its name). An RBM learns probability distributions (the probabilities of many different possibilities at once) based on the free energy of distributions to determine the ground-truth probability distribution of unstructured input data.
RBM 结构是一个连通节点的马尔可夫链随机场,其中层向量“给定v的h ”和“给定h的v ”中神经元激活的联合概率可以用自由能表示,自由能是概率分布稳定性的度量;也就是说,系统中的自由能越少,系统越稳定,对于人工智能来说,它越接近基本事实。
The RBM structure is a Markov chain random field of connected nodes where the joint probability of the neuron activations in the layer vectors “h given v” and “v given h” can be represented by that free energy, which is a measure of the stability of the probability distribution; that is, the less free energy in the system, the more stable the system, meaning for artificial intelligence, the closer it is to the ground truth.
物理化学中的吉布斯自由能是物质状态热力学势的量度。例如,在地球上,H 2 O 有三种状态:液态、固态和蒸汽;在室温和大气压下,尽管空气中有一些水蒸气(湿度),但液态 H 2 O 的自由能最低,是三种状态中最稳定的,因此,当厨房桌子上的冰格中的冰块融化时,自由能会在冰变成液态水的相变中释放出来,而当空气中的液态水蒸发时,自由能在液体到蒸汽的相变中释放,并且系统的自由能降低。
The Gibbs free energy of physical chemistry is a measure of the thermodynamic potential of a state of matter. For example on Earth, H2O has three phases: liquid, solid, and vapor; at room temperature and atmospheric pressure, although there is some water vapor in the air (the humidity), the liquid state of H2O has the lowest free energy and is the most stable of the three states, so as ice cubes in an ice tray on the kitchen table melt, free energy is released in a phase transition of ice to liquid water, and as water liquid in the air evaporates, free energy is released in the phase transition of liquid to vapor, and the free energy of the system is decreased.
换句话说,在标准温度和压强 (STP) 条件下,液态水是 H2O 最稳定的状态,这意味着与其他状态相比,它存在的可能性最高,其他状态存在但具有更多的自由能、不太稳定,因此可能性较小。
In other words, under standard temperature and pressure (STP) conditions, liquid water is the most stable state of H2O, meaning that it has the highest probability compared to the other states, which exist but have more free energy and are less stable and therefore less probable.
如果温度远高于 100……°C 或远低于 0……°C,或者压力不是大气压,蒸汽和冰分别会具有更少的自由能并且更稳定,这意味着如果条件发生变化,基本真实概率分布将会有所不同。
If the temperature is considerably higher than 100……°C or considerably lower than 0……°C, or the pressure is not atmospheric, vapor and ice respectively could have less free energy and be more stable, meaning that if conditions change, the ground truth probability distribution will be different.
RBM 从随机的初始分布开始,将其分布与输入数据的分布进行比较;差异(误差)只是候选分布的自由能,因此就像在其他人工神经网络中一样,通过反向传播最小化该自由能将导致 RBM 生成的概率分布收敛到输入数据的概率分布,从而揭示出基本事实概率以及隐藏在其中的潜在推论。
An RBM, starting from a random initial distribution, compares its distribution with the distribution of the input data; the difference (error) is just the free energy of the candidate distribution, so just as in other artificial neural networks, minimizing that free energy by backpropagation will cause the RBM-generated probability distribution to converge to the probability distribution of the input data, and thus reveal the ground truth probabilities and the latent inferences hidden therein.
表示可见层v和隐藏层h的一对布尔向量 ( v , h ) 的吉布斯自由能由 RBM能量函数给出,
The Gibbs free energy of a pair of Boolean vectors (v, h) representing the visible v and hidden h layers is given by the RBM energy function,
其中a i是第i个神经元的激活能量,v i是可见输入层中神经元的二元状态,h j是隐藏层中神经元的二元状态;b j是偏差向量的元素,每层一个,w ij是权重矩阵W的元素。
where ai is the activation energy of the ith neuron, vi the binary state of the neurons in the visible input layer, and hj the binary state of the neurons in the hidden layer; bj are the elements of the bias vectors, one for each layer, and wij the elements of the weights matrix W.
系统在温度T时具有能量E i的第 i 个状态的热力学概率P i由众所周知的玻尔兹曼分布给出(因此得名 RBM 机器),
The thermodynamic probability Pi of the ith state of a system having an energy Ei at temperature T is given by the well-known Boltzmann distribution (hence the name of the RBM machine),
其中M是系统中所有可能状态的数量,k B是玻尔兹曼常数(与温度和能量相关),Z是正则配分函数,它根据概率要求将方程标准化为0到1之间的值。
where M is the number of all possible states in the system, kB is the Boltzmann constant (relating temperature and energy), and Z is the canonical partition function that normalizes the equation to values between 0 and 1 as required by probability.
对于 RBM,给定h时 v的联合概率P( v , h ),以及给定v时h的联合概率 P(v,h)取决于 RBM 能量函数E( v , h ),由下式给出:
For an RBM, the joint probability P(v, h) of v given h, and of h given v depends on the RBM energy function E(v, h), and is given by,
其中Z rbm是所有可能的可见状态和隐藏状态对的规范分割函数和。
where Zrbm is the canonical partition function sum over all possible pairs of visible and hidden states.
在给定的时间点,RBM 生成的概率分布与 RBM 能量函数E( v , h )一致,该能量由可见层和隐藏层中神经元的参数化激活水平决定。在前馈模式下,RBM 充当自动编码器。
At a given point in time, the RBM-generated probability distribution is in accord with the RBM energy function E(v, h), which energy is determined by the parameterized activation levels of neurons in the visible and hidden layers. In feedforward mode, the RBM thus is acting as an autoencoder.
计算v和h所有状态的可能概率过于密集,因此,我们采用给定v时h的条件联合概率和给定h时v的条件联合概率,
The calculation of the possible probabilities of all the states of v and h is prohibitively dense, so instead the conditional joint probabilities of h given v and v given h are employed,
由于每个神经元激活水平本身都是二进制的,只能是1或0,权重和偏差参数化是因素,当然只对神经元激活水平为1而不是0 的情况有效。对于给定的可见层v的神经元激活水平,隐藏层h中的单个神经元为激活的二进制1 的概率,其水平由共享权重w ij调节可视层神经元vi来调整,
Since each neuron activation level by itself is binary, it can only be 1 or 0, the weight and bias parameterization are factors, and of course are effective only for the case that the neuron activation level is 1 and not 0. For given neuron activation levels of the visible layer v, the probability that a single neuron in the hidden layer h is an activated binary 1 with level adjusted by the shared weights wij modulating the visual layer neurons vi is,
(24.1)
(24.1)
其中σ是 S 型函数。RBM自动编码中有两个偏差bj ,隐藏层底部偏差(无论是否缺少相关数据点都会激活某些神经元)和输入层偏差(加速反向传播过程中的学习)。
where σ is the sigmoid function. There are two biases bj in the RBM autoencoding, the hidden layer floor biases that activate some neurons regardless of any lack of relevant data points, and the input layer biases that accelerate learning on the backpropagation passes.
以同样的方式,对于隐藏层的给定神经元状态,可见神经元为激活二进制1 的概率,其级别由共享权重w ij调节隐藏层神经元h j为1
In the same fashion, the probability that for given neuron states of a hidden layer, a visible neuron is an activated binary 1 with level adjusted by the shared weights wij modulating the hidden layer neurons hj is,1
(24.2)
(24.2)
方程 1 确定了给定v时h的隐藏神经元的激活概率(所谓的吉布斯采样),其中v由随机伯努利分布(二进制是或否,1或0,就像公平掷硬币一样)初始化。方程 2 确定了给定h时v的视觉神经元的激活概率;这两个方程一起给出了给定v时h的联合概率和给定h时v的联合概率。
Equation 1 determines the activation probabilities of the hidden neurons (so-called Gibbs sampling) for h given v, where v is initialized by a random Bernoulli distribution (binary yes or no, 1 or 0, as in a fair-coin toss). Equation 2 determines the activation probabilities of the visual neurons for v given h; together the equations produce the joint probabilities of h given v and v given h.
初始随机伯努利概率与输入数据之间的差异可能很大。前馈运行和迭代反向传播调整权重w ij和偏差b j将最小化该差异,从而重建概率,以便更好地近似未知输入数据概率分布。
The difference between the initial random Bernoulli probabilities and the input data likely will be large. Feedforward runs and iterative backpropagation adjusting the weights wij and biases bj will minimize that difference to produce reconstructions of the probabilities that will be better approximations of the unknown input data probability distribution.
在无监督学习中,RBM 在可见层和隐藏层之间执行前向和后向传递,其中隐藏层的激活是后向传递中输入层的输入,乘以相同的权重,并且总和在每个输入层节点添加到输入层偏差中,从而构成输入层的迭代重建。
In unsupervised learning, the RBM performs forward and backward passes between the visible and hidden layers where the activations of the hidden layer are the inputs to the input layer in a backward pass, multiplied by the same weights, and the sum is added to the input layer bias at each input layer node, and thus constitutes iterative reconstructions of the input layer.
参数调整最好不要像人工神经网络那样通过梯度下降来执行,而是通过所谓的对比发散来执行。经过k次迭代运行后,调整后的输入值向量v k会从原始输入向量v o迭代重建,并用于确定隐藏向量从h o变为h k的激活级别。更新矩阵Δ W是向量v o和v k的外积⊗之间的差值,2
The parameter adjustment is best performed not by gradient descent as in artificial neural networks but rather by so-called contrastive divergence. After k iterative runs, the adjusted input values vector vk is iteratively reconstructed from the original input vector vo, and used to determine the activation levels of the hidden vectors changing from ho to hk. The update matrix ΔW is the difference between the outer products ⊗ of the vectors vo and vk,2
使用梯度上升来计算新矩阵,
The new matrix is calculated using gradient ascent,
RBM 生成的概率分布可以揭示隐藏神经元层h中神经元激活水平h j所显示的“特征”的潜在推断。3
The RBM-generated probability distribution can reveal latent inferences from “features” displayed by the neuron activation levels hj in the hidden neuron layer h.3
例如,假设向 RBM 展示一份包含数百万个数据点的 18-29 岁点播电影网站用户调查数据集。用户已在《星球大战》、《国王的演讲》、《亲吻亭》、《黑客帝国》、《哈利波特》和《三体》等电影中进行了选择,并对这些电影进行了评分,评分标准是喜欢 ( 1 )、不喜欢 ( 0 ) 或没看过 ( −1 )。
For example, suppose the RBM is presented with a dataset of a survey of ages 18–29 users of an on-demand movie website comprising millions of data points. The users have chosen among the movies Star Wars, the King's Speech, the Kissing Booth, the Matrix, Harry Potter, and the Three-Body Problem, rating the films as to whether they like (1), do not like (0), or have not seen (−1) the movie.
对于这个特定的群体,RBM 发现概率分布显示《星球大战》和《黑客帝国》的评分很高,表明对科幻电影有强烈的喜欢推断,但隐藏层特征神经元也显示,喜欢《黑客帝国》的人也喜欢《哈利波特》,这种互相关揭示了协同过滤推理中对幻想的潜在因素。
For this particular group, the RBM finds that the probability distribution shows high ratings for Star Wars and the Matrix indicating a strong like inference for science fiction films, but the hidden layer feature neurons also show that those who like the Matrix also like Harry Potter in a cross-correlation revealing a latent factor in collaborative filtering inference for fantasy.
现在《三体》是一部还没有人看过的新电影,在调查中没有评分( -1 ),但是由于RBM通过奇幻推理发现了用户对科幻小说的偏好,所以18-29岁年龄段的特定用户很有可能会喜欢《三体》,如果进一步基于该特定用户的数据进行推断,就可以自信地向该用户推荐这部新电影。
Now the Three-Body Problem, a new movie which no one has yet seen, has no-rating (−1) in the survey, but since the RBM discovered a preference for science fiction with a fantasy inference, it is highly probable that a specific user from that age 18–29 group will like the Three-Body Problem, and if inferences are further based on that particular user's data, the new movie may be confidently suggested to the user.
回归模型根据独立变量数据输入估计连续的因变量,ANN 分类模型将从数据中提取的特征与标记数据集中的特征进行比较,而 RBM 重构则试图通过迭代最小化误差来生成越来越好的输入数据概率分布近似值,从而对原始输入数据的概率分布进行建模,因此是一种生成学习。4
Whereas regression models estimate a continuous dependent variable based on the independent variable data input, and ANN classification models compare features extracted from the data with the features in labeled datasets, RBM reconstruction is attempting to model the probability distribution of the original input data through generating better and better probability distribution approximations to the input data by iteratively minimizing the error, and thus is a form of generative learning.4
例如,如果未知输入数据概率分布p(x)和重构概率分布q(x)都是正态分布,但形状略有不同,且仅部分重叠,则其差值就是Kullback-Lieber 散度,它衡量两个概率分布曲线下的不同面积,如下图所示。5
For example, if the unknown input data probability distribution p(x) and the reconstructed probability distribution q(x) are both normal distributions but have slightly different shapes and only partially overlap, the difference is the Kullback-Lieber Divergence that measures the diverse areas under the two probability distribution curves, as shown in the figure below.5
RBM 对比散度通过调整权重和偏差参数来最小化差异区域,从而使用重建的概率分布迭代地对未知数据概率分布进行建模。
RBM contrastive divergence minimizes the diverse areas by adjusting the weight and bias parameters to model the unknown data probability distribution iteratively using the reconstructed probability distributions.
图中右侧所示的综合差值为未知概率分布 P 与重构概率分布Q的 Kullback-Lieber 散度D KL (P ║ Q)。
The integrated difference is shown at right in the figure is the Kullback-Lieber Divergence DKL(P║Q) of the unknown probability distribution P and the reconstructed probability distribution Q.
概率分布基于一组结果的概率。例如,掷骰子时,在总共 36 种可能性中,幸运“7”的概率是蛇眼“2”的概率的六倍,因为骰子有六种方式加起来是 7,一种是蛇眼。概率分布将显示为正态分布曲线,其中“7”位于峰值,“2”和双六“12”位于高斯钟形曲线的两侧。
Probability distributions are based on the probabilities of a set of outcomes. For example in rolling dice, out of a total of 36 possibilities, the probability of a lucky “7” is six time higher that the probability of snake eyes “2” since there are six ways of the dice adding to 7 and one for a snake eyes. The probability distribution will be revealed as a Normal distribution curve with the “7” at the peak and “2” and double six “12” on the wings of the Gaussian bell curve.
对于不同的条件,就像H2O相的非 STP 条件一样,对于语音识别来说,e、t和a是英语中最常用的字母,而在冰岛语中,最常用的字母是a、r和n,因此字母表字母在语音中出现的概率分布曲线有很大不同。因此,在冰岛语语音中使用英语字母表概率分布将产生很大的自由能,并且重建将需要多次对比发散运行以最小化 Kullback-Lieber 散度可产生更好的冰岛字母分布,用于语音识别。6
For different conditions, just as in the non-STP conditions for the phases of H2O, for say speech recognition, the e, t, and a are the most commonly used letters in the English language, while in Icelandic, the most common letters are a, r, and n, so the probability distribution curves for the alphabet letter occurrence in speech are substantially different. Therefore using an English alphabet probability distribution on Icelandic speech will result in a large free energy, and the reconstruction will require several contrastive divergence runs to minimize the Kullback-Lieber Divergence to produce a better Icelandic alphabet distribution for speech recognition.6
RBM 通常最初用于无监督学习,首先对未知的输入分布进行建模,并且可以堆叠 RBM 以进行深度学习。RBM 隐藏层可以充当预处理器,用其隐藏层分布替换其输入层,并将其最终重建的输入层馈送到前馈人工神经网络的输入层,这样 ANN 就可以在确定输入数据基本事实分布方面拥有生成学习先机。
An RBM is typically used in unsupervised learning initially to first model the unknown input distribution, and RBMs can be stacked to deep learn. The RBM hidden layer can act as a preprocessor, substituting its input layer with its hidden layer distribution and feeding its final reconstructed input layer to the input layer of a feedforward artificial neural network so that the ANN will have a generative-learning head start on determining the input data ground truth distribution.
RBM 可以作为卷积神经网络的预处理器,用于图像和文本识别,与循环神经网络结合,还可以进行语音识别。但 RBM 最出名的地方在于协同过滤中的潜在因子。
An RBM can serve as a preprocessor for convolutional neural networks performing image and text recognition, and in conjunction with recurrent neural networks, also do speech recognition. But the RBM's main claim to fame is with regard to latent factors in collaborative filtering.
R流出分析和受限玻尔兹曼机可以揭示原始输入数据中显性和隐性的特征和相关性,但更深层的潜在趋势可能仍然隐藏在数据中。斯坦福大学并行 VLSI 架构小组利用机器学习来分析人工智能神经网络的隐藏层,以得出“数据中心推理的推论”,这意味着通过更深入地研究不仅原始网络揭示的内容,而且可以从这些揭示及其含义中进一步推断出什么来获取隐藏的信息。1
Regression analysis and the restricted Boltzmann machine can reveal the overt and hidden features and correlations in raw input data, but deeper latent tendencies may remain buried and hidden in the data. Stanford University's Parallel VLSI Architecture Group has employed machine learning to analyze artificial intelligence neural networks’ hidden layers to draw “inferences from inference at the data center”, meaning obtain hidden information by delving more deeply into not only what the original network reveals, but also what can be further inferred from those revelations and their implications.1
进一步处理基于云的人工神经网络结果可以揭示潜在因素,从而改善需要大量推理才能准确的机器识别类型,例如语音、翻译、面部表情、肢体语言和人类行为准则。
Further processing of cloud-based artificial neural network results can reveal latent factors that improve the type of machine recognition that requires a great deal of inference to be accurate, for example in speech, translation, facial expression, body language, and precepts of human behavior.
在营销中,众所周知,购买决策不一定基于价格、质量和实用性等表面上合理的基础,而是基于时尚和声望等潜在因素。遵循常见的时尚趋势意味着价格稳步下降,因此对质量的关注度降低,以便更多人追随潮流,而高级时装则意味着声望,这保证了高价可以维持高价所确保的独家性。在这两种情况下,实用性都不是影响因素。
In marketing, it is well known that purchasing decisions are not necessarily made on the ostensibly rational bases of price, quality, and utility, but for instance rather on the latent factors of fashion and prestige. Conformance to common fashion trends implies steadily lower pricing and thus less regard for quality so that more people will follow the trend, whereas high fashion implies a prestige that warrants overpricing to maintain the exclusivity that a high price ensures. In neither case is utility a factor.
对于奢侈品卖家来说,推论是,与普通商品零售商不同,他们绝不应该降低价格或进行折扣促销,因为这违背了买家对独家权利的渴望。
The inference for the luxury goods seller is that, unlike a common goods retailer, they should never lower prices or engage in discount sales promotions, for that would be contrary to the buyer's desideratum of exclusivity.
那么,当产品卖不出去或过时,库存积压时,奢侈品卖家会怎么做呢?大型奢侈品品牌有一个肮脏的小秘密,那就是他们会允许员工以低价购买未售出的库存,并宣誓保密。这样,价格就不会公开降低,库存就会清空,而以折扣价购买奢侈品的员工几乎没有动力透露这个秘密,因为这会损害他们自己的形象。2
So what does the luxury goods seller do when a product just does not sell or is going out of fashion and unsold inventory is piling up? Big luxury brands’ dirty little secret is that they will allow their employees upon sworn secrecy to buy unsold inventory at bargain prices. In this way, prices are never publicly lowered, inventory is cleared, and the employees who have acquired luxury items at discount hardly have incentive to reveal the secret as that would be detrimental to their own image.2
在高价奢侈品营销中,在时尚上流社会聚集的高级场所播放的广告和商业广告,表面上是为了吸引新顾客,但暗中却在通过强化买家最初选择的智慧来缓解买家在购买高价商品后必然产生的不和谐感,以促进买家未来的购买。
In the marketing of high-priced luxury items, advertisements and commercials in high-class settings populated with the stylish upper class are overtly placed to attract new customers, but covertly designed to mitigate the buyer's post-purchase dissonance that overpriced items invariably engender by reinforcing the wisdom of the buyer's original choice, with the inference of promoting future purchases.
另一个极端是,网上购物主要是以实用为导向,决策是经过价格比较和之前买家对质量的在线评论后做出的,因此实用商品的降价和同行评论对销售至关重要。事实上,对于购买普通商品以及更主观地推荐一首歌或一部电影来说,顾客评分和评论甚至更为重要。
At the other extreme, online purchasing is largely utility-oriented, and decisions made after price comparisons and online comments from previous buyers’ regarding quality, so price-cutting on useful goods and peer reviews are critical to sales. Indeed, customer ratings and comments are even more important in regard to the purchase of common goods and the more subjective promotion of a song to listen to or a movie to watch.
点播视频网站 Netflix 发现,用户的年龄、性别、教育水平、人口统计,甚至浏览历史等因素虽然有用,但它们本身不足以作为电影选择的先决条件。
The on-demand video website Netflix has found that factors such as age, gender, level of education, and demographics, or even the browsing history of the user, although useful, by themselves are inadequate precursors of movie choice.
人工智能表明,电影流媒体决策通常基于赞成票数(喜欢)、评分、分享和同行评论的数量,以及电影在网上引起的热议。
Artificial intelligence has revealed that movie streaming decisions are often based on number of upvotes (likes), ratings, shares, and peer reviews, in toto the online buzz generated by a film.
基于这一热门因素(一种在线调查形式),Netflix 将用户分为不同的品味群体;例如,电影《国王的演讲》的欣赏者可能对高中电影系列《亲吻亭》不感兴趣,但尽管如此,基于潜在推断,患有真实或想象的言语障碍的青少年可能喜欢这两部电影,而 Netflix 会推荐你“可能还喜欢”的指定集群之外的电影,这些电影已经被协同过滤的潜在推断标记出来。
Based on this buzz factor, which is a form of an online survey, Netflix has classified users into taste clusters; for instance, appreciative viewers of the film The King's Speech may not be interested in the high school film series The Kissing Booth, but nevertheless, based on latent inferences, teenagers with real or imagined speech impediments may like both, and Netflix will recommend “you may also like” films outside your designated cluster that have been flagged by the latent inferences of collaborative filtering.
由于一个人在一定时间内可以观看的电影数量是有限的,Netflix 必须确保用户对他们的选择感到满意,例如使用受限玻尔兹曼机来推理预测他们会喜欢什么,Netflix 可以向有同理心的用户推荐“二十部保证让你哭泣的电影”,并根据爱情、死亡、动物和宠物故事进一步区分,从而进一步呈现一些“开箱即用”的推荐来吸引新的兴趣,所有这些都是为了将 Netflix 最重要的客户流失率保持在 4% 以下。
Because there is a limit to how many movies a person can view in a given period of time, Netflix must ensure that users are happy with their choices, so for example using a restricted Boltzmann machine to predict inferentially what they will like, Netflix can recommend “twenty movies that are guaranteed to make you cry” to empathetic users, with further differentiation based on love, death, animal and pet stories, and thereby further present some “out of the box” recommendations to entice new interests, all to maintain Netflix's all-important churn rate at below 4%.
Netflix 成功运用推理调查和人工智能是其有可能颠覆整个娱乐业的首要因素。电视电影主导者 HBO 拥有 1.5 亿订户,但他们是通过电视有线电视公司收购的,因此 HBO 无法直接获取观众数据以进行人工智能预测分析,但随着被 AT&T 收购,HBO 将成为这家电信巨头旗下华纳媒体的点播电影中心,现在有了可以进行预测分析的数据,想必可以弥补 Netflix 主导的在线流媒体视频业务的不足。
Netflix's successful use of inference surveys and artificial intelligence is a prime factor in its potential to disrupt the entire entertainment industry. The television movie-dominating HBO has 150 million subscribers, but they were acquired through the TV cable companies, so HBO had no direct access to viewer data on which to perform artificial intelligence predictive analytics, but with the acquisition by AT&T, HBO will become the telecommunications giant's subsidiary WarnerMedia's on-demand movie hub, and now with data for predictive analysis, presumably make up ground on the online streaming video business dominated by Netflix.
对此,Netflix 的首席内容官泰德·萨兰多斯(他购买了《纸牌屋》系列的版权,从而让 Netflix 进军在线媒体领域)对整个行业的未来提出了一个非常有说服力的问题:“在 HBO 成为我们之前,我们会成为 HBO 吗?” 3
In response, Netflix's Chief Content Officer Ted Sarandos, who bought the rights to the House of Cards series that launched Netflix's take-off into the online media world, posed the question very cogently to the future of the industry at large, “Will we become HBO before HBO becomes us?”3
消费者行为的复杂性催生了人工智能营销心理学这一学科,以及一个富有进取心的广告技术服务行业,该行业利用机器学习和大数据超越直觉,更深入地探究消费者行为。事实上,广告技术发现驱动购买决策深层心理的潜在因素可能是一些表面上非常不寻常的广告和商业广告的起源。
The complexity of consumer behavior has spawned an entire academic discipline of artificial intelligence marketing psychology, and an enterprising Adtech service industry that employs machine learning and Big Data to go beyond intuition to delve more deeply into consumer behavior. Indeed, Adtech's discovery of latent factors driving the deeper psychology of purchasing decisions may be the genesis of some ostensibly very unusual advertisements and commercials.
受限玻尔兹曼机和人工神经网络的生成学习所进行的预测分析可以通过所谓的协同过滤揭示决策的潜在因素,协同过滤基于这样的假设:“人们喜欢与自己喜欢的东西相似的事物,以及有相似品味的其他人喜欢的东西”。
Predictive analytics performed by restricted Boltzmann machines and artificial neural networks’ generative learning can reveal the latent factors of decision-making through so-called collaborative filtering, which is based on the hypothesis that “people like things similar to other things they like, and things that are liked by other people with similar tastes”.
看似一个非常明显的前提,然而相似性和品味却隐藏在搭配中,仍然需要被发现从一级推论得出二级推论。预测分析首先寻找调查、音乐会和票房收入、DVD 商店销售等唾手可得的成果;然后从音乐和电影网站收集有关在线浏览、“喜欢”、“分享”、评论和社交网络讨论的更多数据。然后通过对比发散最小化揭示二级“来自推理中心的推论”。这种更复杂的数据分析可用于预测新的、尚未听到的歌曲和未看过的电影的成功,然后可用于推荐系统来产生轰动效应,同时产生更多数据以进行更多甚至更深的潜在因素过滤。
Seemingly an utterly obvious premise, the similarities and tastes however are latent in the collation, and still must be discovered as second-degree inferences from the first-degree inferences. Predictive analytics first seeks the low-hanging fruit of surveys, concert and box office receipts, DVD store sales, and so on; and then from music and movie websites, collect further data on online browsing, “likes”, “shares”, comments and social network discussions. The minimization by contrastive divergence then reveals the second-degree “inferences from the inference center”. This more sophisticated data analytics can be used to predict the success of new, yet unheard songs and unseen movies that then can be utilized in recommender systems to generate buzz and concomitantly produce more data for more and perhaps deeper latent factors filtering.
现在每个人都知道,调查和不断征求意见不仅是为了让人们觉得该网站关心你的利益并努力为你的需求提供最好的服务,也是为了把你的选择贡献给他们的大数据。某些网站和分析公司出售消费者数据就是数据商业价值的证明。
Everyone now is aware that the surveys and constant requests for comments are not only meant to impart the feeling that the website has your interests at heart and strives to best serve your needs, but also to contribute your choices to their Big Data. Sales of consumer data by certain websites and analytics firms are proof of the data's commercial value.
除了浪漫、动作、传记、幻想、科幻和动画等表面上的电影偏好因素外,假设潜在的“救赎主题”被协同过滤掉,那么品味集群的成员在访问网站时可能会发现以救赎为主题的电影推荐,而观众自己可能并没有完全意识到这种偏好。
In addition to the ostensible factors of romance, action, biography, fantasy, science fiction, and animation preferences of film choice, assume that a latent “redemption theme” is collaboratively filtered out, the members of a taste cluster then may find redemption-themed film recommendations on visiting the website, a predilection of which the viewers themselves might not have been wholly aware.
2005 年不久前,Netflix 举行了一场公开竞赛,评选最佳协同过滤算法,以预测尚未上映的新电影的吸引力。分析基于电影口碑的硬数据和协同过滤的软数据;获胜算法是在受限玻尔兹曼机上执行的。4
Some time ago in 2005, Netflix held an open competition for the best collaborative filtering algorithm to predict the attractiveness of new, yet unseen films. The analysis was based on the hard data of film buzz, and the soft data of collaborative filtering; the winning algorithm was performed on a restricted Boltzmann machine.4
天文学家利用太空望远镜搜寻系外行星的深奥学科还依赖于协同过滤来自遥远星系的恒星光的周期性下降中的潜在因素,作为对可能承载智慧生物、围绕恒星运行的假定系外行星特征的推断。
The esoteric discipline of exoplanet-hunting by astronomers using space telescopes also depends on the latent factors in collaborative filtering of periodic dips in stellar light from faraway Galaxies as inferences of the signatures of putative exoplanets, possibly bearing intelligent beings, orbiting a star.
深度人工神经网络AstroNet-K2揭示了行星对恒星光线周期性影响的细微差别,该网络通过提取相关恒星的亮度随时间变化的光变曲线,自动消除恒星光信号中的不稳定性与噪声,从而发现表明存在行星轨道的异常现象。通过自动过滤掉其他周期性光变化,从而删除误报,AstroNet-K2 在已发现系外行星的训练数据集上声称准确率达到 98%,为推断新系外行星的存在奠定了基础。
The nuances of an orbiting planet's periodic effect on a star's light was revealed by the deep artificial neural network AstroNet-K2 that automatically removed instability and noise from the star's light signals by extracting brightness-over-time light curves for the star in question to find anomalies that betrayed the existence of an orbiting exoplanet. By autonomously filtering out other periodic light variations, thereby deleting false positives, AstroNet-K2 claimed a 98% accuracy on training datasets of found exoplanets, priming it for the inference of the existence of new exoplanets.
由于受到系外行星的干扰,恒星发出的光信号出现周期性的不稳定性,揭示了一种潜在的滤波协作,据推测可能是一颗系外卫星围绕着一颗围绕恒星运行的系外行星运行。
Periodic instabilities in the light signal from the star as disturbed by the exoplanet have revealed a latent filter collaboration that has been conjectured be an exomoon orbiting an exoplanet that is orbiting the star.
天文学家长期以来一直认为,由于土星本身就有 82 颗大大小小的卫星,因此其他恒星周围的行星也应该有卫星,这是完全合理的。研究人员仔细分析了开普勒太空望远镜发现的 284 颗系外行星的亮度数据,确实发现了系外行星在经过恒星时出现标志性亮度下降之后出现的较小的次级亮度下降,经哈勃太空望远镜确认后,推断系外行星 K-1625b 有一颗质量与地球相当、直径为地球四倍的卫星。5
Astronomers have long believed that since for instance Saturn alone has 82 moons, large and small, it would be entirely reasonable that other planets around other stars would also have moons. Researchers sifting through brightness data from 284 exoplanets found by the Kepler Space Telescope indeed spotted the telltale smaller secondary brightness dip following an exoplanet's signature dip while transiting the star, and after confirmation by the Hubble Space Telescope, the inference was established that exoplanet K-1625b has a moon as massive as the Earth and four times its diameter.5
2019 年 7 月,研究人员利用位于智利的阿尔塔卡马大型毫米/亚毫米阵列(ALMA),从 370 光年外毫米波图案中的模糊斑点推断,围绕金牛座 T 型恒星 PDS 70 运行的年轻行星 PDS 70 c 有一个质量为地球月球四分之一的行星盘。进一步的研究发现,距离地球不远的众多系外卫星。
In July 2019, researchers using the Altacama Large Millimeter/Submillimeter Array (ALMA) in Chile inferred from fuzzy splotches in millimeter wave patterns 370 light years away that the young planet PDS 70 c orbiting the T Tauri star PDS 70 has a circumplanetary disk one-fourth the mass of the Earth's Moon. Further studies have found numerous exomoons not so far from Earth.
可以使用 DCNN 的计算机视觉模式识别和数据分析对智利和中国的巨型射电望远镜阵列的图像进行处理,以寻找更多的系外行星和系外卫星。6
Computer vision pattern recognition by a DCNN and data analytics could be employed on the images of giant radio telescope arrays in Chile and China to search for more exoplanets and exomoons.6
AstroNet-K2 只能识别它已经学会识别的系外行星类型,但通过强化学习和自我监督,它可以开始自己思考如何发现具有不同标志性特征的系外行星,并且像 AlphaGoZero 和 AI Video Gamer 一样,在寻找系外行星和系外卫星方面甚至可以胜过最专业的人类天文学家。
AstroNet-K2 could only spot the type of exoplanets that it had learned to recognize, but with reinforcement learning and self-supervision it could begin to think for itself to discover exoplanets with different signature characteristics, and like AlphaGoZero and the AI Video Gamer in their pursuits, could outperform even the most expert human astronomers in finding exoplanets and exomoons.
通过这样做,人工智能天文学家将协助人类完成也许是人类的终极任务,即在宇宙中寻找其他智慧生物。据估计,仅在我们的银河系中,几乎每颗恒星都有一些绕其运行的行星,因此有超过一万亿颗系外行星和更多的系外卫星有待发现。
In doing so, the AI Astronomer will be assisting humankind in what is perhaps its ultimate undertaking, finding other intelligent beings in the Universe. It has been estimated that in our Milky Way Galaxy alone, almost every star has some orbiting planets and that there are therefore more than one trillion exoplanets and even more exomoons to be discovered.
因此,人类要么在地球上创造具有思维的机器人形式的智能生物,要么在我们自己的银河系中发现由高级生物开发的智能生物。对于宇宙在 140 亿年内发展出的 1 万亿颗系外行星,以地球为例,智能生物肯定会在其中一些系外行星上进化,并且像我们一样,它们肯定会开发出人工智能机器。
Thus, humankind can either create intelligent beings here on Earth in the form of thinking robots, or find them in our own Galaxy as developed by superior beings. For the one trillion exoplanets developing during the 14 billion years of the Universe, taking the Earth as instance, intelligent beings will surely evolve on some of those exoplanets, and like us, they will surely develop artificially intelligent machines.
宇宙中有两万亿个星系,每个星系有一万亿颗系外行星,其中两万亿亿(2×1024 )颗系外行星已经存在并将有数十亿年的时间来发展某种形式的生命,因此仅凭数量和时间,毫无疑问,智慧生物及其智慧创造物就存在于所有星系遥远的系外行星和系外卫星上。7
In the two trillion galaxies in the Universe and one trillion exoplanets per Galaxy, two trillion trillion (2 × 1024) exoplanets have had and will have billions of years to develop some form of life, so by the sheer dint of numbers and time, there is little doubt that intelligent beings and their intelligent creations are on exoplanets and exomoons in the far reaches of all of the Galaxies.7
从这个意义上说,智能机器人的发展在某些星系中似乎是不可避免的,尽管它对人类在地球上的统治地位构成了威胁,但我们人类也有责任这样做,哪怕只是为了我们自己或我们的机电后代在宇宙中的进步而保持相关性。
In this sense, the development of intelligent robots appears inevitable in some galaxy, and in spite of its threat to humankind's dominance on Earth, it is incumbent on us humans to do the same, if for no other reason than our own or our electromechanical progeny's advancement to maintain relevance in the Universe.
碳经典统计分析(例如回归分析)依赖于大数定律,这意味着随着观测值的数量趋于无穷大,经验概率分布函数不可避免地会收敛到基本事实分布函数。人工神经网络最近的成功很大程度上依赖于当今前所未有的大数据,大数据为机器学习提供了大量数字作为训练集,使其能够以更高的准确度进行分类和预测。
Classical statistical analysis such as regression analysis relies on the law of large numbers, which means that as the number of observations tends to infinity, the empirical probability distribution function inevitably converges to the ground truth distribution function. The recent successes of artificial neural networks have largely rested on today's unprecedented Big Data that provides the large number of numbers as training sets for machines to learn to classify and predict with ever greater accuracy.
然而,许多学科没有足够的数据来满足大数定律的要求,因此设计了人工智能机器来在数据有限的情况下提供分类。
There are, however, many disciplines that just do not have sufficient data to meet the requirement of the law of large numbers, so AI machines were devised to provide classification in cases of limited data.
支持向量机( SVM) 将数据子集组织成数据向量,并通过生成分界线将这些数据向量分成几类,该分界线将数据向量最好地划分为分界线两侧的类;对于二维,分界线只是一条线或曲线;对于三维,分界线是一个平面;对于更高维度,分界线是一个不可见的超平面。
A Support Vector Machine (SVM) takes subsets of data organized into data vectors and separates those vectors of data into classes by producing a demarcation that best divides the data vectors into classes on opposite sides of the demarcation; for two-dimensions, the demarcation is simply a line or curve, for three-dimensions the demarcation is a plane, and for higher-dimensions, the demarcation is a non-visualizable hyperplane.
通过关注最接近分界线的数据向量(称为支持向量),因为它们最能“支持”分界线。指定类别中但距离最佳超平面较远的其他数据向量不定义分类,并且将对最优超平面的位置影响不大,但会被准确分类。
By focusing on the data vectors closest to the dividing demarcation, called the support vectors, because they most significantly “support” the demarcation. Other data vectors in the assigned class but farther away from the optimal hyperplane do not define the classification and will have little influence on the position of the optimal hyperplane, but they will be accurately classified.
支持向量与超平面的距离称为边距;它可以被认为是贯穿数据向量的“街道”的宽度;SVM 的目标是找到创建支持向量最大边距(最宽的街道)的分界线,从而提供最佳(最明确)的超平面,将被正确分类的数据分离在街道边距的适当一侧。
The distance of the support vectors away from the hyperplane is called the margin; it that can be thought of as the width of a “street” running through the data vectors; the goal of an SVM is to find the demarcation that creates the maximum margin of the support vectors (the broadest street), thereby providing the optimal (most clear-cut) hyperplane separating the data being correctly classified on the appropriate side of the street margin(s).
右图描绘了待分类数据向量的二维空间,用圆圈和三角形表示不同的类别向量,并用最优线进行分类,最优线的边距是向量空间中分离严重程度的度量。清晰的圆圈和三角形向量表示最接近最优分界线的“支持向量”,从而划定了分类的边距(宽度);也就是说,它们是分类的第一线“支持者”。
The figure at right depicts a two-dimensional space of data vectors to be classified with the different class vectors represented by circles and triangles, and classified by the optimal line whose margin is a measure of the severity of the separation in the vector space. The clear circles and triangles vectors represent the “support vectors” that are closest to the optimal dividing line and thereby delineate the margin (width) of the dividing classification; that is, they are the front-line “supporters” of the classification.
如果无法找到一条简单的二维线来明显区分数据向量,例如,如下图左所示,圆形数据向量被三角形数据向量聚集并包围,则可以通过称为核化的过程将数据向量映射到三维空间中,现在“漂浮在 3D 空间中”的圆形和三角形数据向量通过形象地拉伸分类分界线以形成平面来分离,并在空间中旋转平面,以便最好地分离圆形和三角形数据向量,并找到具有支持向量边缘的最佳平面,如下图右所示。从物理学角度来看,扩展到三维空间为将数据向量划分为不同类别创造了更多的自由度。1
If a simple two-dimensional line cannot be found to distinctively separate the data vectors, for instance as in the case where the circle data vectors are bunched up and surrounded by triangular data vectors as shown in figure below at left, the data vectors can be mapped by a process called kernelling onto a three-dimensional space with the now “floating in 3D space” circle and triangular data vectors being separated by figuratively stretching out the classifying demarcation line to form a flat plane, and rotating the plane in space so as to best separate the circle and triangular data vectors and find the optimal plane with the margin of the support vectors as shown schematically in figure below at right. In physics terms, the expansion to three-dimensional space has created one more degree of freedom for the segregation of the data vectors into different classes.1
如果数据向量的三维平面分离仍不能明确地将数据向量划分为类别,则可以将空间扩展为超空间的更高维度,通过寻找超平面为数据向量的分离提供更多的自由度。将空间划分为四维、五维、六维和更高维度(原则上甚至可以划分为无限维希尔伯特空间)的过程虽然不直观,但可以持续进行,直到分类分离达到明确的最佳超平面(在某些情况下可能无法实现)。
If the three-dimensional planar separation of the data vectors still does not clearly segregate the data vectors into classes, the space can be expanded into the higher dimensions of hyperspace, to provide more degrees of freedom for segregation of data vectors by finding hyperplanes. This kernelling of the space to four, five, six, and higher dimensions (in principle even to an infinite-dimension Hilbert space), although not visualizable, can be continued until the classification separation reaches a clearly definite optimal hyperplane (which in some cases may not be realizable).
任何维度的边距都可以通过支持向量与超平面正交向量的内积来计算,从而产生标量不变的边距“宽度”。这些是最佳超平面边距在超空间中的范围;即支持向量与超平面之间的最小距离。显然,边距越宽,分类越明确。
The margins in any dimension can be calculated by the inner product of the support vectors and vectors orthogonal to the hyperplane producing scalar invariant margin “widths”. These are the extent in hyperspace of the optimal hyperplane margins; that is, the minimum distances of the support vectors away from the hyperplanes. Clearly, the wider the margins, the more definite the classification.
可以利用欧拉-拉格朗日方程和拉格朗日乘数来找到最小化边缘范围的函数,前提是支持向量必须是最接近超平面的向量(有关欧拉-拉格朗日方程和拉格朗日乘数,请参阅附录)。
The Euler-Lagrange equation and Lagrange multipliers can be employed to find the function that minimizes the margin extent under the constraint that the support vectors must be the vectors closest to the hyperplane (see the Appendix for the Euler-Lagrange equation and Lagrange multiplier).
由此产生的最优超平面表达式,不幸的是是非线性形式,从一组样本中产生一个决策向量,并且必须使用数值分析来求解优化方程,但由于所有这些计算都是由计算机完成的,所以这只是 SVM 算法过程中的另一个步骤。2
The resulting optimal hyperplane expression, unfortunately in non-linear form, produces a decision vector from a group of samples, and the optimization equation must be solved using numerical analysis, but since all of these computations are being done by a computer anyway, this is just another step in the SVM algorithmic process.2
还可以证明,决策向量表达式始终是凹的,因此只有一个全局极值,从而避免了人工神经网络中梯度下降特有的局部最小值和最大值问题。3
It also can be shown that the decision vector expression is always concave so that there is only one global extremal, which avoids the problems of local minima and maxima endemic to gradient descent in artificial neural networks.3
支持向量机的奇妙之处在于,在核化变换中,核可以简单地用任意维度的内积表示,并且可以尝试各种变换核,例如内积向量的多项式和指数函数,以找到最佳超平面和边距。此外,与典型的 ANN 相比,进行拟合所需的参数更少,而且显然没有局部极值来困扰结果。
The marvelous aspect of support vector machines is that in the kernelling transformation, the kernel can be simply represented by inner products in any dimension, and various transformation kernels can be tried, for example polynomial and exponential functions of the inner product vectors to find the optimal hyperplane and margins. Furthermore, there are fewer parameters than in typical ANNs to do the fitting, and significantly there are no local extremals to plague the results.
支持向量机通常用于文本分类、垃圾邮件过滤、图像识别、颜色分类,以及目前用于邮局自动化中的手写数字识别。
Support vector machines are typically used for text classification, spam filtering, image recognition, colors classification, and currently for handwritten digit recognition currently used in post office automation.
支持向量机在化学科学与工程中的应用示例可以突出研究方法和 SVM 可以实现的解决方案。
An example of support vector machines use in the chemical sciences and engineering can highlight the research approach and the solutions that can be achieved by SVMs.
实验、分析获得的数据、并基于对这些发现的分析提出组织假设以对过程工程和新发现进行预测,一直是化学、化学工程、材料科学、环境科学和药理学领域科学与工程研究的一般程序。
Experimentation, analysis of the data obtained, and an organizing hypothesis for making predictions for process engineering and new discoveries based on the analysis of those discoveries have been the general procedure of scientific and engineering research in chemistry, chemical engineering, materials science, environmental science, and pharmacology.
化学反应的分析和预测主要采用经典统计方法,如对反应数据进行回归分析。然而大多数反应过程复杂、非线性、多变量、噪声较大,这些都使得高效准确地提取有用的化学研究信息变得极其困难。
The analysis and predictions of chemical reactions has been done largely by classical statistical methods such as regression analysis performed on reactions data. However most of the reaction processes are complicated, non-linear, multivariate, with copious noise, all of which have made the efficient and accurate extraction of useful chemical research information extremely difficult.
化学反应受温度、压力、浓度、催化活性、溶剂、初始和边界条件以及无数情境因素的影响。所涉及材料的特性和行为也受其化学成分、相、颗粒大小、杂质和其他因素的影响。药物设计取决于大有机分子的形状和电荷以补充生物分子靶标,并且非常复杂,只能通过计算机建模来完成。
Chemical reactions are influenced by temperature, pressure, concentration, catalytic activity, solvents, initial and boundary conditions, and myriad situational factors. The character and behavior of the materials involved are also affected by their chemical composition, phase, particle size, impurities, and other factors. Pharmaceutical drug design depends on the shape and charge of large organic molecules to complement the biomolecular target and is so complicated that it can only be done by computer modeling.
工业制造中的化学和冶金工程过程涉及传热、传质、流体流动、化学反应、反应系列以及通常五六个以上的同时进行的过程,必须从几十种可能性中进行特征选择,并对所有过程进行优化以实现高效的工厂运行。
Chemical and metallurgical engineering processes in industrial manufacturing involve heat transfer, mass transfer, fluid flow, chemical reactions, reaction series, and typically more than five or six simultaneous processes that must be feature selected from dozens of possibilities, and all optimized for efficient plant operation.
由于与独特的化学反应和材料过程相关的决定因素很多,而对于工业过程的现场操作则需要考虑更多的细节,解决一个实际问题需要涉及令人眼花缭乱的不同化学剂的组合,再加上将相关因素与无关噪音分离的必要性(由于过程的复杂性,噪音总是相当大的),使得任何模型的预测都充满了困难和不确定性。
Because of the many determinants relevant to sui generis chemical reactions and material processes, and then even more particulars to consider for industrial processes in situ operations, the solution to a practical problem involving a bewildering montage of disparate chemical agents, plus the necessity to separate the relevant factors from the irrelevant noise (which because of the complexity of the processes is always considerable) altogether render the predictions of any model fraught with difficulty and uncertainty.
例如,在一个典型的石油工厂中,来自不同国家的原油在本质上和成分上是不同的,即使是来自同一产地的不同油轮运输的原油成分也可能有很大差异,在运输和储存舱内,不同容器的温度和压力条件不同且随时间而变化,催化活性也会随时间而变化,因此会发生持续的放热化学反应,从而引起混乱的化学和物理变化等等,无数不断变化的因素影响着工厂的化学和热力学过程。这些因素使得对石油在交付和加工过程中的化学成分的预测变得非常不确定。
For example, in a typical petroleum plant, the incoming crude oil from different countries is essentially and compositionally different, the crude oil composition from different tankers carrying oil even from the same source can substantially vary, and within the transport and storage holds, the temperature and pressure conditions are different for different containers and change with time, the catalytic activity also changes over time, so there are on-going exothermic chemical reactions that can induce chaotic chemical and physical changes, and so on ad infinitum an enormous number of constantly changing factors affecting the chemical and thermodynamical processes of the plant. These factors render predictions regarding the chemical composition of the oil at delivery and during processing extremely uncertain.
即使能够在各种化学反应、不同物理状态、环境和工业工厂条件的变化中进行管理,相关过程因素与噪声的分离甚至会随着样本的大小而变化。
Even being able to manage in the midst of all the vagaries of chemical reactions, different physical states, environmental and industrial plant conditions, the separation of relevant process factors from noise can further even vary with just the size of the sample.
线性关系当然使假设的组织回归函数生活更简单,但几乎所有研究感兴趣的复杂化学反应过程都是非线性的,而线性回归的简单直线将所有非线性数据视为噪声,因此简单的线性模型存在数据欠拟合的风险,而使用具有许多项和可调参数的更复杂函数来拟合非线性数据可能会导致过多的可调参数化,从而导致数据过度拟合。
A linear relationship certainly makes the hypothetical organizing regression function life simpler, but almost all the complicated chemical reaction processes of research interest are non-linear, and the simple straight line of linear regression considers all non-linear data as noise, so a simplistic linear model has the risk of underfitting the data, while the use of more complicated functions with many terms and adjustable parameters to fit non-linear data can result in too many adjustable parameterizations that will overfit the data.
由于变量数量众多,且变量间的非线性相互作用通常十分复杂,因此使用多元回归关系,通常会导致模型具有太多参数,从而使得其预测和理论实施结果过于多变而没有多大用处。
The use of multivariate regression, because of the large number of variables and their often very complicated non-linear interactional relationships, typically results in a model having too many parameters, rendering its predictions and theoretical implementation results too variable to have much use.
如果将组织函数改为多项式回归模型,则基于以下事实:任何连续函数都可以用一系列具有无限项的多项式来表示(魏尔斯特拉斯定理),如果不同次数、不同系数的多项式多项式可以更好地拟合数据,则适当截断该系列可以充分近似该函数。但这需要更多的项和参数,同样存在过度拟合数据和预测和实际实施的泛化损失的真正风险。4
If the organizing function is changed to a polynomial regression model based on the fact that any continuous function can be represented by a series of polynomials with an infinite number of terms (Weierstrass’ theorem), and appropriately truncating the series may adequately approximate the function if multi-term polynomials of various degree with different coefficients can better fit the data. But this requires many more terms and parameters, again bearing the very real risk of overfitting the data and loss of generalization for prediction and practical implementation.4
任何连续函数都可以用人工神经网络来近似,但人工神经网络也受到欠拟合和过拟合这两个问题的困扰;而对于计算化学而言,良好的建模所需的化学物理参数太多,而依靠大数定律得到有用的可推广结果的数据又太少。
Any continuous function can also be approximated by an artificial neural network, but ANNs are also plagued by the twin gremlins of underfitting and overfitting, and in the case of computational chemistry, there are too many chemical physics parameters for good modeling and too little data to depend on the law of large numbers to reach a useful generalizable result.
因为支持向量机可以在非常高维的超空间中提供分类,所以原则上对可能的因子分离数量没有限制,并且 SVM 似乎非常适合于实验数据有限的工业过程的高度复杂的计算化学。
Because support vector machines can provide classification in a very high-dimensional hyperspace, there is in principle no limit to the number of factor segregations possible, and the SVM appears ideal for the highly complex computational chemistry of industrial processes that have limited experimental data.
然而,由于复杂过程中存在无数不同的条件和操作因素,必须首先对数据进行一些细化以提高分类的效率;例如,可以对监督训练中表现出较大误差的数据样本进行离群值删除,然后进行留一法(LOO)交叉验证,从而解析出至少一些比较明显的噪音的数据。
However, because of the myriad different conditions and operational factors in complex processes, some refinement of the data must be first employed to improve the efficiency of classification; for instance, outlier deletion where the data samples exhibiting large errors in supervised training can be leave-one-out (LOO) cross-validated, thereby parsing the data of at least some of the more obvious noise.
然后可以在较低维的输入空间中进行超平面空间的高维特征空间中的SVM核化。这样,非线性过程可以通过SVM核化使用涉及较少数量的可调参数的数据向量的内积来处理。
Then SVM kernelling in the high-dimensional feature space of hyperplane space can be performed in a lower-dimensional input space. In this way, non-linear processes can be handled by SVM kernelling using the inner product of the data vectors involving a smaller number of adjustable parameters.
此外,支持向量机可以同时处理线性和非线性过程,因此可以减少拟合不足的问题,并且如果组织假设中的参数数量可以有限,希望在欠拟合和过拟合之间找到一个折衷点。寻找最优超平面的全局极值也使解决方案独一无二,因此可靠。
Support vector machines furthermore can treat both linear and non-linear processes at once, so the problem of underfitting can be reduced, and if the number of parameters in the organizing hypotheses can be limited, there hopefully can be found a happy medium between under- and overfitting. The global extremal for finding the optimal hyperplane also makes the solution unique and therefor dependable.
支持向量机用于原子参数模式识别、热力学、分子结构关系建模、材料分析、微量元素研究、考古化学的研究,以及电池制造、石油工程、癌症治疗、化学品、材料和药物设计等实际工业领域。5
Support vector machines are used in research of atomic parameter pattern recognition, thermodynamics, molecular structural relationships modeling, materials analysis, trace element studies, archeological chemistry, and in the practical industries of battery manufacture, petroleum engineering, cancer cures, and the design of chemical, materials, and pharmaceutical drugs.5
R强化学习 (RL) 对 AlphaGo 战胜围棋大师至关重要,但无论这种深奥的游戏多么有趣,在西方国家并不为人所知(至少在 AlphaGo 之前)。另一方面,多伦多/DeepMind 视频游戏玩家在事先甚至不知道游戏规则的情况下公开击败了专业视频游戏玩家,这是一个令人大开眼界的事件,完全符合现代日常生活,尤其是年轻人的生活。
Reinforcement learning (RL) was critical to AlphaGo's victories over the Go Masters, but the esoteric game, no matter how intriguing, is not that well known in Western countries (at least before AlphaGo). On the other hand, the Toronto/DeepMind Video Gamer's very public defeat of expert video gamers without even knowing the rules of the game beforehand was an eye-opening event entirely in tune with modern everyday life, especially among the young.
通过奖励和惩罚进行学习似乎完全是常识,并且不仅用于抚养孩子和日常生活,还用于许多不同的学科,例如博弈论、自动控制、信息论、运筹学,甚至动物心理学。
Learning by rewards and punishment is seemingly altogether common sense, and is used not only in raising children and workaday life, but also in many disparate disciplines such as game theory, automatic control, information theory, operations research, and even animal psychology.
采用强化学习的机器在与人类的游戏中取得了成功,因为机器的学习方式与人类一样,从奖励和惩罚的经验中学习,任何游戏玩家都知道,你玩的游戏越多,你就会越厉害,但与人类不同的是,机器的技能可以通过数百万场游戏来不知疲倦地磨练,不仅可以与人类专家对抗,还可以与其他机器对抗,一旦确定了自己的优势,它就可以与自己的早期版本对战,最终通过机器自我强化远远超越最优秀人类的技能。
Machines employing reinforcement learning were successful in game-playing against humans because the machine learned just like human beings learn, from experience with rewards and punishments, and any gamer knows that the more games you play, the better you will get, but unlike humans, a machine's skill can be tirelessly honed through millions of games against not only expert humans, but other machines, and once establishing its superiority, it can play against earlier versions of itself, ultimately going far beyond the skill of the best humans through machine self-strengthening.
在强化学习中,处于给定状态的代理可以执行从代理可以采取的所有动作集合中选择的动作它会根据所处的领域,根据需要改变自己的行动以适应和应对环境中的变化。在游戏算法中,输入是代理在领域当前状态下的行动,输出是对该状态的响应而采取的行动所产生的奖励或惩罚,从而导致代理和环境在该行动之后的状态。
In reinforcement learning, an agent in a given state can perform an action chosen from a set of all actions that the agent can take with respect to the domain in which it finds itself, changing its action as necessary to adapt and confront changes in that environment. In a game-playing algorithm, the input is the agent's action in the current state of the domain, and the output is the reward or punishment emanating from the action taken in response to that state, resulting in a consequent state of the agent and the environment after that action.
由此可以看出,RL 可以采用指定奖励和惩罚的马尔可夫链来构成马尔可夫决策过程(MDP),其中当前状态下的代理拥有有关该状态的所有信息,这些信息是决定新步骤所需的。这当然与国际象棋或围棋中的一步棋相同,每一步都会改变棋盘的状态,并向玩家呈现一个新状态,因此需要根据新状态做出决策。
From this it can be seen then that RL can be employ a Markov chain designating rewards and punishments to constitute a Markov Decision Process (MDP) in which the agent in the current state has all that the information regarding that state which is needed to decide on a new step. This is of course the same case in a move in chess or Go that changes the state of the game board and presents a new state to the player after every move, leading to the need of a decision based on the new state.
强化学习是一个简单的 MDP 反馈循环,以连续的时间增量运行,如下图所示,其中下标表示时间t及其增量t + 1。强化学习本质上构成了一系列状态-动作(s t , a t ) 对,这些对分别执行奖励( r t和r t+1的高正值)和惩罚( r t和r t+1的低值或负值)。
Reinforcement learning operates as a simple MDP feed-back loop operating in successive time increments, as shown in the figure below, where the subscripts denote time t and its increment t + 1. Reinforcement learning essentially constitutes a sequence of state-action (st, at) pairs that are performed respective to rewards (high positive values for rt and rt+1) and punishments (low or negative values for rt and rt+1).
代理寻求选择能够随着时间的推移最大化奖励总和的行动。Q学习算法通过马尔可夫决策过程找到特定状态下行动的价值,随着时间的推移,该过程为Q学习函数提供最优策略,其中每个行动的价值由Q函数表示,该函数更新状态-行动对 ( s t , a t ),使其与后续行动的奖励相称。然后使用贝尔曼方程值迭代更新确定即时奖励与后续行动获得的所有可能的未来奖励的最高组合a t,并使用旧值和新信息的加权平均值进行更新,
The agent seeks to select actions that maximizes the sum of rewards over time. The Q-learning algorithm finds the value of an action in a particular state by the Markov Decision Process which over time provides the Q-learning function with an optimal policy whereby the value of each action is represented by a Q-function that updates the state-action pairs (st, at) commensurate with the rewards of subsequent actions. Then the highest combination of immediate reward with all possible future rewards gained by later actions at is determined using the Bellman equation value iteration update with the weighted average of the old value and the new information,
其中Q new (s t , a t ) 是新的Q值,α是学习率,方括号中的表达式是学习到的新时间值,其中r t是奖励,γ是折扣因子,maxQ(s t+1 , a)是最大未来值的估计。
where Qnew(st, at) is the new Q value, α is the learning rate, the expression in square brackets is the learned new temporal value wherein rt is the reward, γ is the discount factor, and maxQ(st+1, a) is the estimate of maximum future value.
Q由任意值初始化,然后随着时间t 的进展,代理选择一个动作a t,记录奖励r t,并进入新状态s t+1,并且Q迭代更新为Q new以产生新状态的动作值,该值应随着经验的获得而稳步增加,从而学习如何最好地玩游戏。
Q is initialized by an arbitrary value then as time t progresses, the agent selects an action at, notes the reward rt, and enters a new state st+1 and Q is updated to Qnew iteratively to produce the action value for the new state which should steadily increase as experience is gained, thereby learning how best to play the game.
Q函数可以考虑游戏序列中后续时间步骤中引入的延迟奖励,并且可以通过程序嵌套递归地执行,以将这些奖励包含在算法计算中。
The Q-function can consider delayed rewards introduced in later time steps in the game sequence, and can act recursively through program nesting to encompass those rewards in the algorithm computation.
代理的目标是最大化总奖励。它通过将未来状态可获得的最大奖励添加到实现其当前状态的奖励中来实现此目的,从而通过潜在的未来奖励有效地影响当前行动。此目标是所有未来步骤的奖励预期值的加权总和,表示为,
The goal of the agent is to maximize the total reward. It does this by adding the maximum reward attainable from future states to the reward for achieving its current state, effectively influencing the current action by the potential future reward. This Goal is a weighted sum of expected values of the rewards of all future steps, expressed by,
这只是对时间 t 内的奖励总和r(s t , a t )进行最大化乘以折扣因子 γ 的t次方,其中s t是给定时间的状态,a t是该时间的动作。
This is just the maximization of a sum of rewards r(st, at) over time t multiplied by a discount factor γ raised to the t power, where st is the state at a given time and at is the action at that time.
Q函数根据所采取的行动对环境的影响为其提供一个数值分数,该分数通过将每个状态-动作对映射到一个数字来提供,该数字由对最有助于状态达到目标的奖励的经验决定。
The Q-function provides a numerical score for the action taken based on its effect on the environment by mapping each state-action pair to a number that is determined by experience of the rewards that best contribute to the states reaching the Goal.
例如,一个拾取和放置机器人正在接受 RL 代理控制器的训练,以便对机器人拾起物体并将其放置在指定位置给予正奖励,但如果机器人掉落物体、将其放置在错误位置或根本没有执行任何操作,则会给予机器人一个较低或负的惩罚数字作为“奖励”。
For example, a pick-and-place robot is being trained by an RL agent controller to give the robot a positive reward for picking up the object and placing it in the designated position, but if the robot drops the object, places it in the wrong place, or does nothing at all, it is given a low or negative punishment number as “reward”.
在训练中多次运行游戏算法后,Q函数会从该经验中选择具有最高Q 值的 ( s t , a t ) 对。使用来自环境的反馈,为每个新动作发回标量奖励。
After running the game-playing algorithm many times in training, the Q-function selects the (st, at) pair with the highest Q value from that experience. Using feedback from the environment, a scalar reward is sent back for each new action.
折扣因子γ本质上是衡量即时奖励与未来奖励的相对重要性。由于它的值介于0和1之间(0≦ γ ≦1),因此将γ提升到时间t次方意味着如果γ很小,则随着时间流逝(t增加),奖励会乘以一个快速减少的因子γ t ,而目标的价值会迅速下降,因此当γ→ 0 时,奖励是“短视的”(近视的),因为更直接的目标对于实现目标更重要(例如在乒乓球中)。另一方面,较大的γ不会如此迅速地降低目标的价值,因为当γ→1时,奖励是“远视的”(远视的),因为奖励乘以一个较慢减少的因子γ t后会更长时间地保持较高的价值,从而更加重视长期目标(例如在国际象棋和围棋中)。
The discount factor γ is essentially gauging the relative importance of immediate rewards versus future rewards. Since it has a value between 0 and 1 (0≦γ≦1), raising γ to the time t power means that if γ is small, as time goes by (t increases) the reward is multiplied by a fast-decreasing factor of γt and the value of the Goal decreases very quickly, so as γ→0 the reward is “myopic” (nearsighted) in the sense that the more immediate goals are important to reaching the Goal (for example in ping-pong). On the other hand, a larger γ does not reduce the value of the Goal so rapidly because as γ→1, the reward is “hyperopic” (farsighted) in that the reward multiplied by a slower-decreasing factor γt thus maintains a higher value longer, thereby attaching more relative importance to longer-term goals (for example in chess and Go).
在操作中,γ还可以从数学上防止目标方程中的求和爆炸到 ∞ 并挂起计算。折扣因子γ可以手工设计或机器学习,以最大化在经历活动中的不同轨迹后达到目标的概率。在复杂的环境中,在与给定状态相称的众多选择中选择最佳动作需要对动作的质量进行排名,这基于对( s t , a t ) 对的值的度量;也就是说,它们在多大程度上进一步积累了奖励,基于这些值的策略函数π 将状态s t映射到最佳已知动作a t,
In operation, γ also mathematically prevents the summation in the Goal equation from exploding to ∞ and hanging up the computation. The discount factor γ can be hand-engineered or machine-learned to maximize the probability of reaching the Goal after going through different trajectories in the activity. In complex environments, selecting the best action among many choices commensurate with a given state requires the ranking of the quality of actions, which is based on a measure of the value of (st, at) pairs; that is, how much do they further the positive accumulation of rewards, and a policy function π based on those values maps a state st to the best known action at,
给定操作的价值取决于环境、执行操作的状态以及执行操作的时间。强化学习让代理运行 ( s t , a t ) 对序列,记录产生的奖励,并计算Q函数,直到它为代理生成最佳轨迹,以便其在最大化过程中采取该轨迹,实际上就是建立策略函数 π。
The value of a given action depends on the environment, the state in which it is performed, and the time it was taken. Reinforcement learning runs the agent through sequences of (st, at) pairs, noting the resulting rewards, and calculating the Q-function until it produces the best trajectory for the agent to take through the maximization process, in effect establishing the policy function π.
当然,策略函数必须避免简单地重复之前获得最高奖励的相同动作或动作(过度拟合),因为这可能会导致代理放弃可能获得更高奖励的动作,因此除了利用旧途径之外,算法还应包括对新分支的探索;两者的比率是,
The policy function must of course avoid simply repeating the same actions or moves that previously garnered the highest rewards (overfitting), for that may cause the agent to forego actions with possible higher rewards, so in addition to exploitation of old avenues, exploration of new branches should be included in the algorithm; the ratio of the two is,
其中更大胆的特工将会是所谓的є-贪婪者。
where the more daring agents will be so-called є-greedy.
例如,在麻省理工学院玩的第一个电子视频游戏中,球拍代理成功将乒乓球击回将获得继续得分的正奖励,而没击中球将获得失分的负奖励,其目标当然是比对方球员更快地积累一定数量的分数。
For example, in the first electronic video game played at MIT, the paddle agent successfully hitting the ping-pong ball back has a positive reward of continuing to the possibility of gaining a point, and missing the ball has a negative reward of losing a point, with the goal of course being amassing a certain number of points sooner than the opposing player.
任何打过乒乓球、网球、手球或壁球的人都知道,勇敢的回球手会回击每一球,而进攻者会大胆出击。贪婪型球员的试探性击球可以立即获得制胜分,但更容易失误,并导致惩罚性地丢分,而久经考验的击球则利用对手的失误,利用稳定回球的延迟奖励。因此,奖励函数反映了贪婪型球员错失进攻击球的百分比。
Anyone who has played ping-pong, tennis, handball, or racquetball knows about the doughty returner who just returns every shot, and the attacker who takes chances with daring shots. The є-greedy player's exploratory shots can result in immediate rewards of winning points, but are easier to miss and result in a punishing loss of a point, where tried and true shots exploit the delayed reward of steady return play banking on an opponent's error. The reward function thus reflects the є-greedy player's percentage of missed attacking shots.
经过多场游戏之后,折扣因子γ将根据玩家的技能水平来调整奖励,这可以通过多场游戏中奖励和惩罚的迭代累积来揭示,因此最佳平均结果将为特定玩家产生最佳的策略函数,强调他的技能并忽略他的弱点。
After many games, the discount factor γ will have modulated the rewards commensurate with the player's skill level as revealed by the iterative accumulation of reward and punishment over the multiple games, the best average results thus will produce the best policy function for the particular player that emphasizes his skills and discounts his weaknesses.
因此,强化学习Q函数将状态-动作对映射到奖励以找到动作的价值。可以使用计算机视觉卷积神经网络来识别状态,例如超级马里奥面临的障碍及其周围环境的图像代表一种状态,在 CNN 识别出障碍之后,基于这些值的策略函数 π 将状态s t映射到最佳已知动作a t。代理在该状态下可以执行的克服障碍的可能动作的等级(例如跳过墙壁或避开旋转门)要采取的动作,例如,跳过障碍将为超级马里奥提供 10 分,因为绕过障碍需要更多时间,所以只给 5 分,而迎面撞上障碍将导致 -5 分的惩罚。
Thus, the reinforced learning Q-function maps state-action pairs to rewards to find the value of an action. A computer vision convolutional neural network can be employed to recognize a state, for example the image of a barrier and its surroundings confronting Super Mario represents a state, and after the CNN recognizes the barrier, then the policy function π based on those values maps a state st to the best known action at. The ranks of the possible actions that the agent can perform in that state to overcome the barrier (for example jumping a wall or avoiding a swinging door) the actions to take, for instance, jumping over the barrier will give Super Mario 10 points, because going around the barrier takes more time, it gives only 5 points, and hitting it head-on will result in a −5 points punishment.
重要的是要认识到,强化学习中没有物理原理在起作用,也不需要监督训练,尽管它可以作为一个开端。强化学习通过Q学习进行,它迭代地选择动作路径,根据动作的奖励和惩罚产生更高的预期值。
It is important to realize that there are no physical principles at play in reinforcement learning and there is no need for supervised training, although it can be employed as a head-start. RL proceeds through Q-learning which chooses paths of actions iteratively that produce higher expected values based on the rewards and punishments of the action.
因此,强化学习算法具有极强的可推广性,因为它们像人类一样,从特定情况下积累的自身经验中学习。因此,它们可以像人类一样探索和它可以完全自下而上地处理许多不同的任务,无需自上而下的手工工程。而且,由于它们没有主观偏好,与人类不同,它们可以完全基于Q学习到的最优策略客观地进行,经过全天候不知疲倦的练习,它们可以轻松击败情感受损、时间受限的人类,甚至无需事先了解游戏规则!
Reinforcement learning algorithms thus are supremely generalizable as they learn from the accumulation of own experience in a given situation just as humans do. Thus, like humans they can explore and handle many different tasks completely bottom-up with no top-down hand-engineering, and since they have no subjective predilections, unlike humans, they can objectively proceed based entirely on the Q-learned optimal policy, and after tireless 24/7 practice, they can easily defeat the emotionally- impaired and time-constrained human without even a priori knowing the rules of the game!
一个lphaGo 的人工神经网络模拟了人类大脑的神经元网络,这些神经元通过输入刺激被激活,通过突触网络连接形成想法,产生“思维”模式。DeepMind 用于下围棋的原始人工神经网络版本有一个19 × 19 × 48的体积矩阵输入层和 13 个滤波器卷积隐藏层,完全连接到 softmax 层和决策向量。
AlphaGo's artificial neural network emulates the human brain's network of neurons that are activated by input stimuli to form ideas by synaptic network connections producing “thought” patterns. DeepMind's original version of the artificial neural network for playing Go had a 19 × 19 × 48 volume matrix input layer and 13 filter-convolved hidden layers fully connected to a softmax layer and decision vector.
AlphaGo的硬件包括1920个CPU、280个GPU,在与韩国选手李世石和中国选手柯洁的比赛中,采用了谷歌的加速张量处理单元(TPU)ASIC。
AlphaGo's hardware comprised 1920 CPUs, 280 GPUs, and in the matches against Korea's Lee Sedol and China's Ke Jie, employed Google's accelerating Tensor Processing Unit (TPU) ASIC.
由于可能的走法数量几乎是无限的(2×10170 ),因此使用马尔可夫链树搜索作为可能走法的扩展。每走一步后,都会提供并选择后续的走法分支,最后通过模拟游戏的进行,确定走法的价值。
Because of the almost infinite number of possible moves (2 × 10170), a Markov chain tree search was used as an expansion of possible moves. After each move, the subsequent branches of moves are offered and chosen, and finally through simulated playout of a game, the value of the moves can be determined.
AlphaGo 首先通过监督学习来学习如何下棋,在这种情况下,它通过对战一组已发布的专业围棋比赛来训练。然后,通过梯度下降,它与训练集中标记的最佳走法相比,将出错的成本降至最低,并通过反向传播差异来调整最初随机分配的神经元激活水平的权重和偏差参数,AlphaGo 学会了如何像专业围棋大师一样下棋。
AlphaGo learns how to play first through supervised learning, in this case by playing a training set of published professional Go matches. Then by gradient-descent it minimizes the costs of being wrong when compared with labeled optimal moves in the training set, and by adjusting the initially randomly assigned parameters of weights and biases of the neuron activation levels by backpropagating the differences, AlphaGo learned how to play as well as a professional Go master.
此时,AlphaGo 已经可以轻松击败业余棋手,但它必须不断提高才能与专业围棋大师对弈,因此在经过监督训练后,AlphaGo 通过强化学习进行训练,以生成一个价值网络,从而产生一个游戏策略,然后通过在自监督学习中与自身的改进版本进行游戏来完善其策略。
At this point, AlphaGo could easily defeat amateur players, but it must improve to be able to play with professional Go Masters, so after supervised training, AlphaGo was trained by reinforcement learning to generate a value network resulting in a playing policy, and then refined its policy by playing against improving versions of itself in self-supervised learning.
强化学习可以使用γ折扣因子来帮助评估利用即时本地战斗和延迟远程位置探索的奖励。
The reinforcement learning could use the γ discount factor to help evaluate rewards for both exploitive immediate local fights and the delayed exploration of remote position.
虽然李世石和柯洁等顶尖棋手总是积极寻找和利用局部战斗,但 AlphaGo 有望采用冷酷的、计算机化的深蓝式自上而下的风格来挑衅,然后参与这些战斗以夺取地盘,但强化学习也教会了 AlphaGo 变得贪婪,经常放弃局部战斗中的潜在收益,冒险探索新的棋盘位置。1
While the best players like Lee Sedol and Ke Jie are always aggressively looking for and exploiting local fights, AlphaGo was expected to play a cold, computerized Deep Blue type top-down style to provoke and then engage in those fights to gain territory, but reinforced learning taught AlphaGo also to be є-greedy, often foregoing potential gains in a local fight to adventurously explore new board positions.1
例如在与李世石的第一场比赛中,AlphaGo 倾向于在棋盘左上方采取试探性先手,而放弃了公认的“阿吉智慧” ,即在经典的局部战斗中“品味”各种可能性,许多评论员认为这一举动对于第一场比赛的胜利至关重要。
For instance in Game 1 against Lee Sedol, AlphaGo preferred to take an exploratory sente initiative in the upper left of the board, and forego the perceived wisdom of aji, the “savoring” of possibilities in a classic local fight, a move that many commentators regarded as critical to victory in Game 1.
贪婪可以开辟新的争夺领域,而当第一块进攻石矗立在决定新领土冲突走向的位置时,它就占有优势,并可能因此控制新领土冲突的进展。
Being є-greedy can open up new areas of contention, and with the first foray stone standing at the position that will definitively characterize the new territorial clash, it has the advantage, and may be able to thusly control the progression of the new territorial conflict.
然而,如果试探性进攻没有在地盘或棋子上取得积极的进展,那么就会因为放弃立即展开的剥削性战斗而导致不必要的地盘和棋子的损失,人们认为 AlphaGo 会更有效,但是,一台据称更冷静、逻辑更严密的 AlphaGo 计算机表明,它既敢于冒险,又一丝不苟。2
However, exploratory forays that produce no positive gains in territory or stones can result in unnecessary losses of territory and stones by foregoing immediate exploitive fights, where it was believed that AlphaGo would be more effective, but, a supposedly a more coldly logical AlphaGo computer showed that it could be adventurous as well as meticulous.2
例如,AlphaGo在第二局比赛中第五线肩击黑棋 37 让评论员甚至李世石本人都感到惊讶,因为它似乎将太多的潜在边界区域拱手让给了白棋,赛后查看 AlphaGo 的对局日志时,发现这一举动的等级仅为 10 −4强化学习动作值,但这一被广泛认为的探索性举动被视为第二场比赛胜利的关键,因为它通过随后与棋盘右上象限的合并,统一了 AlphaGo 的整个棋盘位置。3
For example, AlphaGo's fifth-line shoulder hit black 37 in Game 2 was met with astonishment from commentators and even Lee himself, for it seemingly gave up too much potential boundary territory to white, and indeed a post-match check of AlphaGo's game log found that the move was ranked at only as a 10−4 reinforcement learning action value, yet this widely regarded exploratory move was seen as the key to victory in Game 2 because it unified AlphaGo's total board position by subsequent consolidation with the upper right quadrant of the board.3
同样的,由于探索性走法的高强化奖励具有更大的不可预测性,AlphaGo 对延迟位置奖励(先手)的追求很容易导致无法充分利用战斗中放置得当的棋子所带来的潜在可能性,即“品味”(阿吉)。此外,在第四场比赛中,AlphaGo 输了,也许是因为李世石的“神楔走法”白 78 太出人意料,让 AlphaGo 陷入了一系列“错误”,或许是付出了贪婪的代价。
By the same token, because of the greater unpredictability of high reinforcement rewards for exploratory moves, AlphaGo's pursuit of delayed position rewards (sente), can easily lead to a dearth of exploiting the latent lingering possibilities that a well-placed stone that a fight presents; that is, “taste” (aji). Moreover, in Game 4, AlphaGo lost perhaps because Lee's “divine wedge move” white 78, being so unexpected, discombobulated AlphaGo into a number of “mistakes”, perhaps paying the price of є-greediness.
这是否意味着 AlphaGo 会感到意外和困惑,就像 Deep Thought 面对兵线防御或像卡斯帕罗夫这样容易产生偏执的人类玩家一样?
Does this mean that AlphaGo can be surprised and addled, just like Deep Thought against the pawn-line defense or a prone to paranoia human player like Kasparov?
AlphaGo 是否应该使用交叉熵成本函数来处理此类意外情况?深度人工神经网络在所有这些隐藏层中是否真的存在弱点?AlphaGo 的情绪是否可以分析,或者这些情绪是否隐藏在人工神经网络的隐藏层心理深处?在李世石白棋 78 之后的上图中,AlphaGo 的后续动作中是否能看出任何情绪或犹豫不决?
Should AlphaGo use a cross-entropy cost function to handle such surprises? Does a deep artificial neural network actually have frailties in all those hidden layers? Are AlphaGo's emotions analyzable, or are those emotions concealed deep within an artificial neuron network's hidden layer psyche? Is any emotion or indecision discernible in AlphaGo's subsequent moves as shown in the figure above after Lee Sedol's white 78?
尽管有人担心 AlphaGo 在第 4 局失利后可能会改变策略,甚至改变策略,在对手的不可思议的举动面前不知所措(就像卡斯帕罗夫的反计算机象棋策略一样)。然而,AlphaGo 以微弱优势赢得了第 5 局和比赛,显然能够从错误中恢复过来,并保持(对计算机来说,这很不寻常)冒险的贪婪。
Despite worries that AlphaGo after its Game 4 loss might change its tactics, or worse alter its policy and succumb to confusion in the face of an opponent's improbable moves (just like Kasparov's anti-computer chess strategy). However, AlphaGo won a close Game 5 and the Match, apparently able to recover from its mistakes and remain (uncharacteristically for a computer) adventurously є-greedy.
多伦多的视频游戏玩家与专业游戏玩家人类的较量在 Atari 2600 测试平台上进行了,包括Beam Rider、Breakout、Enduro、Pong、Q*bert、Seaquest和Space Invaders等游戏,呈现了210 x 160 RGB 原始像素60 Hz 的视频,游戏环境经过专门设计,难度较大。游戏模型是一个卷积神经网络,使用Q学习的强化学习变体进行训练。
Toronto's Video Gamer versus An Expert Gamer Human was contested on the Atari 2600 testbed games of Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, and Space Invaders, presenting a 210 x 160 RGB raw pixel video at 60 Hz with game-playing environment that was specifically designed to be difficult. The game-playing model is a convolutional neural network trained with a reinforcement learning variant of Q-learning.
游戏代理在每个时间步骤的采样经验在多个场景中汇集到重放记忆中,并对许多过去的行为进行平滑处理。Q学习应用于游戏算法的内部 do 循环中,该算法采用随机小批量更新,从计算机视觉深度卷积神经网络的视觉输入中进行强化学习。
The playing agent's sampled experiences at each time step were pooled over many episodes into a replay memory and smoothed over many past behaviors. Q-learning is applied during an inner do-loop of the game-playing algorithm employing stochastic mini-batch updates that reinforcement-learn from the visual inputs to the computer vision deep convolutional neural network.
神经网络的输入是一张84 × 84 × 4 的图像,第一个隐藏层是步幅为 4 的16 × 8 × 8滤波器卷积层;第二个隐藏层卷积 32 个步幅为 2 的4 × 4滤波器,最后一个隐藏层是完全连接的 256 个单元整流器,即所谓的深度 Q 网络(DQN)。输出层是一个完全连接的向量,每个有效动作都有一个输出。游戏中的有效动作数量从 4 到 18 不等,
The input to the neural network is an 84 × 84 × 4 image, the first hidden layer is a 16 × 8 × 8 filter convolutional layer with stride 4; the second hidden layer convolves 32 4 × 4 filters with stride 2, and the final hidden layer is a fully connected 256 units rectifier in what was called a deep Q network (DQN). The output layer is a fully connected vector with a single output for each valid action. The number of valid actions varied from 4 to 18 in the games,
所有七款游戏都使用相同的网络架构、学习算法和超参数,没有提供任何手工设计的游戏规则。由于不同游戏的得分差异很大,所有正奖励都设置为 1,所有负奖励都设置为 -1,而对没有效果的动作则为 0,从而提供了一个非常通用的模型。简单的奖励结构限制了误差导数的规模,并有助于在不同游戏中使用相同的学习率,但由于没有很大的奖励差异,所有这些都有可能降低代理实现其目标的速度。
The same network architecture, learning algorithm, and hyperparameters were used for all seven games with no hand-engineered game rules provided. Since the scores greatly varied from game to game, all positive rewards were set to be 1 and all negative rewards to −1, and 0 rewards for moves having no effect, thereby providing a very generalizable model. The simple rewards structure limits the scale of the error derivatives and facilitates the employment of the same learning rate for different games, but all at the risk of degrading agent speed to its Goal because there is no great rewards differentiation.
在监督学习中,可以通过训练、验证和测试数据集来跟踪模型的性能,但在强化学习中,评估算法采用代理在大量游戏中平均在一集或整个游戏中收集的总奖励。
In supervised learning, the model's performance can be tracked with training, validation, and test data sets, but in reinforcement learning the evaluation algorithm took the total reward the agent collects in an episode or total game averaged over a large number of games.
使用了估计的动作值 Q函数,它提供了代理从任何给定状态执行其网络策略可以获得多少延迟奖励的估计。
The estimated action-value Q-function was used, which provides an estimate of how much delayed reward the agent can obtain by following through on its network policy from any given state.
随着强化学习算法的运行,预测的Q值在强化学习过程中得到了相对平稳的改进,并且任何游戏中都没有出现分歧(令人惊讶,但可能是因为对误差导数施加了限制)。这表明,尽管缺乏任何理论上的收敛保证,但游戏算法能够以非常稳定的方式使用强化学习信号和随机梯度下降来训练大型神经网络。4
As the reinforced learning algorithm ran, there were relatively smooth improvements to predicted Q-vales during reinforcement learning, and there was no divergence in any of the games (surprisingly, but likely because of limits placed on the error derivatives). This suggests that, despite lacking any theoretical convergence guarantees, the game algorithm is able to train large neural networks using reinforcement learning signals and stochastic gradient descent in a very stable manner.4
在 2018 年的一项更艰巨的挑战中,DeepMind 的AlphaStar在实时在线视频游戏《星际争霸 II》中击败了所有人类玩家。这位多伦多的视频游戏玩家甚至在不知道游戏规则的情况下,就能玩很多不同的游戏,而且表现得和人类专家玩家一样好甚至更好,这似乎完全符合智能“学习”的第一个定义,如果 AlphaStar 能够适应更复杂的《星际争霸 II》 ,那么它就恰好满足了第 3 章中可泛化智能的定义;也就是说,
In a more difficult challenge in 2018, DeepMind's AlphaStar beat all human gamers in the real-time online video game StarCraft II. That Toronto's Video Gamer, without even first knowing the rules of the games, can play many different games and perform as well or better than expert human gamers appears to exactly match what is meant by the first definition of intelligent “learning”, and if AlphaStar can adapt to the more complicated StarCraft II, then it precisely satisfies the definition of generalizable intelligence in Chapter 3; that is,
获取和应用知识和技能的能力
The ability to acquire and apply knowledge and skills
加上支撑结构,
with the buttressing addition,
感知或推断信息的能力,并将其保留为知识,以应用于环境或情境中的适应性行为,
the ability to perceive or infer information, and to retain it as knowledge to be applied towards adaptive behaviors within an environment or context,
和
and
一般认知问题解决。
general cognitive problem-solving.
更令人不安的是,DeepMind 开发了一支视频游戏代理团队,这些代理经过训练可以独立行动,然后合作参加 450,000 次Quake III Arena夺旗模式比赛,仅使用像素和游戏得分作为输入。这些代理协同制定了可以在锦标赛式评估中击败人类玩家团队的策略,使用两层优化过程,其中独立的 RL 代理在随机生成的环境中同时接受训练,每个代理学习自己的奖励。在 AI 的视野中,成群的机器人合作实现指定的竞争目标的前景令人担忧。5
What is more disturbing, DeepMind has developed a team of video game agents who were trained to act independently to cooperate to compete in 450,000 runs of the Quake III Arena in Capture the Flag mode using only pixel and game points scored as input. The agents in concert developed strategies that could defeat human player teams in tournament-style evaluations using a two-tier optimization process in which independent RL agents are trained concurrently on randomly generated environments with each agent learning its own rewards. Swarms of robots cooperating to achieve specified competitive goals are ominously clear on the AI horizon.5
我在跳棋、国际象棋和围棋等棋盘游戏中,在完全信息竞争环境中,玩家知道游戏过程中每个时刻所有棋子的准确状态。相比之下,扑克是一种不完全信息游戏,玩家和对手都不知道隐藏的牌。
In the board games of checkers, chess, and Go, the players know the exact state of all the pieces at every point in the game in perfect information competitive settings. In contrast, poker is an imperfect information game where there are hidden cards unknown to both the player and his opponent.
卡内基梅隆大学的扑克人工智能计算机Libratus不仅能够在没有完整信息的情况下准确猜出对手手牌的质量,而且能够仅通过分析一轮游戏中的下注时机和下注金额,让对手无法准确猜出自己的牌。Libratus是如何做到如此成功的,以至于能够击败世界顶级德州扑克玩家的呢?
Carnegie-Mellon's Libratus poker-playing AI computer had not only correctly guessed the quality of the opponent's hand and without complete information, but also prevented the opponent from accurately guessing its own cards solely by analyzing the timing and amount of bets in a round. How did Libratus do this so successfully that it could defeat the world's top players in Texas Hold’em?
计算机无法在发牌或下注时表现出假装的痛苦或快乐来欺骗对手,但同样,它也无法识别对手的面部和肢体语言诡计。
A computer cannot display feigned anguish or joy to deceive an opponent when dealt a hand or during betting, but by the same token, it is immune to the opponent's facial and body language subterfuges.
因此,计算机和人类对手都必须依靠下注的时机和金额来隐藏好牌或坏牌,并仅通过下注大小和时机的虚张声势来引诱跟注和加注,或者激起过早弃牌。
Therefore, both the computer and its human opponent must rely only on the timing and amounts of its betting to conceal a good or bad hand, and entice calls and raises or provoke premature folding solely by means of the bluff and bluster of the size and timing of the bets.
玩家显然必须在整个游戏过程中混合使用自己的投注策略,以避免对手识别和利用模式化行为(过度拟合),同样能够识别对手的任何投注行为模式,并调整自己的投注策略以获得有利的反应。
A player obviously must mix up his betting strategy throughout a game to avoid patterned behavior (overfitting) being recognized and exploited by the opponent, and likewise be able to recognize any patterns of betting behavior of the opponent, and adjust his own betting strategy for advantageous responses.
是否有科学依据来开发一种可以由计算机执行的扑克下注策略?获胜当然取决于受限系统中的概率是可以衡量和利用的,这是每个扑克玩家都非常了解的。
Is there a scientific basis for developing a poker-betting strategy that can be performed by a computer? Winning is of course based on probabilities in a constrained system that can be gauged and exploited, something that every poker player knows all too well.
投注策略可以基于贝叶斯公式和蒙特卡洛树搜索模拟中分配给马尔可夫链节点的概率,但奇怪的是,在有限、对抗、不完美信息的扑克游戏中,优势在于利用数学博弈论的纳什均衡。
Betting strategy can be based on the Bayes formula and probabilities assigned to Markov chain nodes in Monte Carlo tree-search simulations, but the upper hand in the finite, adversarial, imperfect information game of poker curiously lies in the exploiting the Nash equilibrium of mathematical game theory.
在有限对抗游戏中,当每个玩家都采用一种策略时,就会达到一种策略均衡,即没有一个玩家可以通过改变策略而获益,而其他玩家则不会改变策略。数学家约翰·纳什证明了这种均衡存在于每个有限游戏中。
A strategic equilibrium in a finite adversarial game, is reached when every player employs a strategy such that no one player can benefit by changing strategies while the other players do not change their strategies. The mathematician John Nash proved that such an equilibrium exists in every finite game.
我们都经历过对抗性和简单有限顺序井字游戏中的纳什均衡,其中玩家通常从角落开始,如果对手不知道纳什均衡并且也占据角落,则先手玩家将获胜。如果对手占据任何其他位置,并且玩家理性行事,纳什均衡将成立,并且每场比赛都会陷入僵局,双方都不会获胜。
We have all experienced the Nash equilibrium in the adversarial and trivially finite sequential game of tic-tac-toe where one typically starts at a corner, and if the opponent is not aware of the Nash equilibrium and also plays a corner, the first player will win. If the opponent plays any other position, and the players proceed rationally, the Nash equilibrium will hold and stalemates will be reached for every game where neither player will ever win.
零和游戏是对抗性的但非连续性的石头剪刀布游戏,其中每个玩家都试图最大化自己的支出并最小化对手的支出。获胜的支出为1,失败的支出为 - 1,平局的支出为0,可以显示在下面的战略形式中,其中在游戏矩阵的每个元素中,第一个数字是玩家 A 的支出,逗号后的第二个数字是玩家 B 的支出。
A zero-sum game where each player attempts to maximize his payout and minimize the opponent's payout is the adversarial but not sequential game of rock-paper-scissors. The payouts where winning is 1, losing is −1 and a draw is 0, can be displayed on a strategic form shown below where in each element of the game matrix, the first number is Player A's payout and after the comma, the second number is Player B's payout.
统计结果表明,如果一方在 33% 的时间内随机做出每个选择,那么从长远来看,该策略无法被利用,而如果对手发现了这一点并以同样的方式出牌,那么双方都无法利用对方,随着时间的推移,双方将陷入石头剪刀布纳什均衡的僵局,双方都没有优势,最终双方的输赢程度都一样大。
It turns out statistically that if one randomly plays each choice 33% of the time, in the very long run, that strategy cannot be exploited, and if the opponent discovers that and plays the same way, neither player can exploit the other, and over time the players are at an impasse in a rock-paper-scissors Nash equilibrium where neither has an advantage and each will end up losing as much as winning.
但是如果一个玩家偏离了均衡策略,比如增加纸牌游戏的百分比,尽管在某些情况下获胜,那么该玩家偏离纳什均衡的行为就可以被利用,从长远来看,由于他无法利用严格遵守均衡策略的对手,他将因为退出纳什均衡而损失比赢得更多。
But if one player diverges from the equilibrium strategy, for example increasing the percentage of paper plays, although winning in some instances, that player's departure from the Nash equilibrium can be exploited, and in the long run since he cannot exploit an opponent who strictly adheres to the equilibrium strategy, he will lose more than he wins because he withdrew from the Nash equilibrium.
为了说明玩家如何在更复杂的游戏中最大化自己的赔付,同时最小化对手的赔付,下面显示了足球点球(PK)的战略形式矩阵。
To demonstrate how a player can maximize his payout while minimizing an opponent's payout in a more complicated game, the football penalty-kick (PK) whose strategic form matrix is shown below.
PK 具有更微妙的回报,例如基于左脚罚球者更强的右 (R) 方向射门和守门员更强的右突进阻挡踢球者左侧 (L) 射门,并且考虑到统计上大多数 PK 都是成功的,守门员的回报通常低于踢球者,但在许多情况下仍然取决于 PK 阻挡动作的方向,如 PK 战略形式矩阵中基于数据的元素(这里根据不同的回报 x 任意选择)所示。
The PK has more subtle payouts based on for instance a left-footed penalty-kicker's stronger right (R) direction shot and a goalie's stronger right-lunge block of a kicker's left (L) side shot, and considering that statistically most PKs are made, the goalie's payout is generally lower than the kicker's payout, but still dependent in many cases on the direction of the PK's blocking move, as shown by the data-based elements (here arbitrarily chosen with regard to the different payoutx) in the PK strategic form matrix.
从数学上来说,零和或常数和k就是,
Now mathematically, the zero-sum or constant sum k is just,
其中s 1是踢球手S 1的策略,s 2是守门员S 2的策略。极小最大定理指出,当且仅当满足以下条件时,踢球手和守门员(s 1 ,s 2 ) 的策略才处于零和博弈的均衡状态:
where s1 is the kicker S1's strategy, and s2 is the goalie S2's strategy. The minimax theorem states that the strategies for the kicker and the goalie (s1,s2) are in an equilibrium of a zero-sum game if and only if,
其中撇号表示虚拟变量,
where the prime indicates a dummy variable, and
因此对于S 1 (对于S 2也类似),
Thus for S1 (and similarly for S2),
简单来说就是,玩家S 1试图寻找最大化自己效用的策略,而他的对手S 2 (其概率支出为S 1的支出减去 S 1 的支出)则试图最小化S 1的效用;对于玩家S 2而言,反之亦然,所有这些共同构成了均衡策略。
This is simply saying that player S1 is trying to find the s1 that maximizes his utility while his opponent S2, whose probability payout is minus S1's payout, is trying to minimize S1's utility; and similarly vice versa for player S2, all of which together constitutes the equilibrium strategy.
踢球者将根据战略形式矩阵中排列的值进行最大化和最小化,如下所示,
The kicker will maximize and minimize according to the values arrayed in the strategic form matrix as,
现在,由于逆向概率就是(1-所选策略的概率),
Now since the probabilities of the converse is just (1 – the probability of the chosen strategy),
为了找到S 2策略的最小结果,仅取最小部分,
To find the minimum outcome on S2's strategy, take the minimum part only,
然后对s 2 (L)求导并设为零以找到最小值,
then taking the derivative with respect to s2(L) and setting equal to zero to find the minimum,
因为s 1 (R) = 1 − s 1 (L),所以s 1 (R) = 1/2,踢球者应该将他的 PK 策略左右各一半混合,这似乎非常合理,但就像所有数学一样,你仍然必须证明它,这里就证明了这一点。
Since s1(R) = 1 − s1(L), then s1(R) = 1/2 as well, the kicker should mix his PK strategy by half-half to the left and right, which seems eminently reasonable, but as in all of mathematics, you must still demonstrate it, and here it is demonstrated.
然而,对守门员S 2重复上述过程,最小化踢球手S 1策略的结果是,
However, repeating the procedure above for the goalie S2, the result for minimizing the kicker S1's strategy is,
揭示了守门员的策略应该是 3/4 的时间向右冲刺,1/4 的时间向左冲刺以阻挡点球,这些动作反映了他更强的一面,但尽管如此,这里还是展示了这一点。当然,这些结果是战略形式的踢球者和守门员数据的产物,再次证明了大数据的重要性,即使对于表面上不可预测的体育运动也是如此。
revealing that the goalie's strategy should be to lunge to the right 3/4 of the time and 1/4 to the left to block the penalty kicks, which actions reflect his stronger side, but nonetheless is demonstrated here. Of course these outcomes are the product of kicker and goalie data in the strategic form, once again demonstrating the importance of Big Data, even for the ostensibly unpredictability of sports.
这些策略共同产生了这一特定 PK 情形下玩家的均衡策略,这意味着,如果踢球者或守门员中的任何一方偏离了自己的策略,而另一方没有偏离,那么从长期来看,在多次 PK 中,偏离策略的一方将会失败;这正是纳什均衡的本质。
These strategies together produce the equilibrium strategy for the players in this particular PK situation, which means that if either that kicker or that goalie deviates from his strategy and the other does not, the deviator will lose over the long run of many, many PKs; that is just the essence of the Nash equilibrium.
然而,这种分析是基于对踢球者和守门员的感知实力的加权,并且已经表明,在许多比赛的 PK 情况下,射门方向和守门员扑球方向没有统计差异,并且球员大多表现随机,只有轻微的倾向强侧。1
This analysis, however, is based on the weighting of the perceived strengths of the kicker and the goalie, and it has been shown that in PK situations over many games, there is no statistical difference in the directions of shots taken and goalie lunges, and that players mostly behave randomly with only a slight tendency to go to strong sides.1
也许男人唯一比足球更喜欢的事情就是赌博,而喋喋不休的单挑、无限注的德州扑克尤其有吸引力。专业的牌手总是依靠概率来确定,但由于一场扑克游戏中有 10 161个决策点,所以遍历整个游戏树甚至一次都是不可能的,而且显然不可能在整场比赛中获得每个点的确定性选择,因此几乎不可能确定整场比赛的纳什均衡。
Perhaps the only thing that men like more than football is gambling, with the voluble mano a mano game of heads-up, no-limit Texas Hold’em poker having special appeal. Expert card players will always rely on the probabilities, such that they can ascertain, but since there are 10161 decision points in a game of poker, traversing the entire game tree even once is impossible, and a deterministic choice for each point is clearly not possible to obtain for an entire game, and so the Nash equilibrium for the whole game is almost impossible to determine.
相反,开发了一种对纸牌和动作进行抽象的模型,用计算机科学术语来说,这意味着删除物理、空间和时间的细节和属性,以便专注于手头任务的本质,在本例中是扑克游戏中的纸牌和下注动作。
Instead, a model of the abstraction of cards and action is developed, which in computer science jargon means the removal of physical, spatial, and temporal details and attributes in order to focus on the essentials of the task at hand, in this case the cards and betting actions in a game of poker.
在抽象中,有许多战略上相似的情况可以一起归类以进行树搜索,例如类似的牌,如早期回合的 K 高牌和 Q 高牌同花(牌抽象),以及类似的赌注,如 500 美元和 595 美元,可以以 100 美元的增量一起归类(动作抽象)。
In the abstraction, there are many strategically similar situations that can be classified together for tree search, for example similar hands like early-round king-high and queen-high flushes (card abstraction) and similar bets like $500 and $595 can be classified together in increments of $100 (action abstraction).
Libratus 的三个主要模块中的第一个使用扑克抽象中的极小极大理论计算近似纳什均衡解,作为游戏前几轮的游戏策略蓝图,进而作为后续轮次的先行策略。抽象中的初始下注行为模仿了年度计算机扑克比赛(ACPC) 中顶级竞争者最常见的下注行为,以提供矩阵元素形式的策略。如果在游戏过程中,对手选择不在抽象中的行为,则该行为将映射到抽象中的类似行为。
Libratus’ first of three main modules computed approximate Nash equilibrium solutions using minimax theory in an abstraction of poker to serve as a game strategy blueprint for the early rounds of the game which in turn served as a precursor strategy for later rounds. The initial betting actions in the abstraction were patterned after the most common bet actions by the top contenders in the Annual Computer Poker Competition (ACPC), to provide the strategy form matrix elements. If during play, the opponent chooses an action that is not in the abstraction, that action is mapped to a similar action that is in the abstraction.
蓝图策略是在 Libratus 与自己进行模拟游戏的过程中磨练出来的,使用蒙特卡洛反事实遗憾最小化(MCCFR) 迭代算法的改进版本,该算法在每个决策点独立最小化“遗憾”,记录过去未选择某个行动时有多少遗憾(任何玩过扑克的人都可以很容易地看到 MCCFR 与扑克有多么密切的关系)。因此,如果有机会,Libratus 会选择遗憾程度最高的行动,梯度下降反向传播,经过多次游戏迭代后,遗憾的平均值被最小化到接近于零,从而改进蓝图策略。
The blueprint strategy was honed by Libratus playing simulated games against itself in reinforced learning using a modified version of the Monte Carlo Counterfactual Regret Minimization (MCCFR) iterative algorithm that independently minimizes “regret” at every decision point, registering how much regret there is at not choosing an action in the past (anyone who has ever played poker can easily see how particularly germane the MCCFR is to poker). So given the opportunity, Libratus will choose the action with the highest regret, gradient-descent backpropagate, and after many game iterations, the average of the regrets is minimized to approach zero, thereby improving the blueprint strategy.
在模拟游戏中,一名玩家会探索抽象中所有可能的动作并更新他的遗憾,而对手则只根据他当前的遗憾来行动。然后,两名玩家的角色在每局之后互换。因此,根据之前游戏中行动的遗憾来确定客观概率分布,从而提供下注行动的价值。该价值还取决于它在后面一局中被执行的概率,如果过度或不足地执行,价值就会降低,这取决于对手的反应和自己在游戏过程中的弱点。
In simulated games, one player will explore every possible action in the abstraction and update his regrets while the opponent plays solely on his current regrets. The roles of the two players are then reversed after each hand. An objective probability distribution is thus determined on the basis of regrets of actions in previous games, thereby providing the value of a betting action. That value also depends on the probability of it being played in a later hand, where the value decreases if overly- or underly-played as determined by the assessment of the opponent's responses and one's own weaknesses as results of the play.
在“单挑、无限注”德州扑克中,“单挑”是指只有两名玩家对战,因此不存在多人游戏中可能出现的合谋团伙攻击某位玩家的情况;而“无限注”则意味着一名玩家可以一次性押上所有筹码,这使得下注的可变性和风险性更大。
In “heads up, no-limit” Texas Hold’em poker, the “heads-up” refers to only two players playing against each other so that there is no possibility of collusive ganging-up on a player as can happen in multiple-player games; the “no-limit” means that a player can bet up to all his chips at one time, making for much greater betting variability and risk.
在单挑但下注有限的扑克游戏中,只有 10 13 个独特的决策点;如果两位玩家在零和游戏中都按照 MCCFR 进行游戏,他们的平均策略将收敛到纳什均衡。但为了解释无限注德州扑克中的 10 161 个静态决策点,Libratus 通过采样形式的基于遗憾的修剪(RBP)改进了 MCCFR ,其中高遗憾分支被修剪。
In heads-up but limited bets poker-playing systems, there are only 1013 unique decision points; if both players play according to the MCCFR in a zero-sum game, their average strategies converge to a Nash equilibrium. But to account for the 10161 static decision points in no-limit Hold’em poker, Libratus improved the MCCFR by a sampled form of Regret-Based Pruning (RBP) where the high regret branches are pruned.
然后,整个游戏被分解为子游戏,每个子游戏都适用于纳什均衡计算,主要用于防御目的,即找到自己的弱点,避免被对手利用,其次用于进攻目的,即找到对手策略中的弱点并加以利用。然后可以根据代理和对手的玩法迭代调整策略蓝图并加快计算速度。2
Then the whole game is broken down into subgames that are individually amenable to Nash equilibrium calculations which are used primarily for defensive purposes in finding one's own weaknesses to avoid being exploited by one's opponent, and secondarily for offensive purposes of finding weaknesses in the opponent's strategy and exploiting them. The strategy blueprint then can be iteratively adjusted and calculations speeded up in response to the agent and the opponent's play.2
在博弈论中,子博弈是博弈中的任何子集,其中子集的所有成员都属于具有单个初始节点并包含其自身所有后继节点的子博弈(如右侧示意图所示,其中共有六个子博弈,其中两个子博弈包含两个子博弈,每个子博弈被椭圆包围)。
In game theory, a subgame is any subset of a game where all members of the subset belong to the subgame that has a single initial node and includes all of its own successor nodes (as shown schematically in the schematic figure at right where there are altogether six subgames, two of which contain two subgames each as enclosed by the ovals).
因此,子博弈是一种本身就构成游戏的游戏,并且凭借其设计的隔离,原则上只有与整个游戏相关的可信威胁才会在子博弈中被消除,从而允许玩家专注于分析子博弈,而忽略整个游戏的早期历史和后来的进展。
A subgame therefore is a game that constitutes a game in its own right, and by dint of its designed isolation, credible threats germane only to the whole game are in principle eliminated in the subgame, thereby allowing the player to concentrate on analyzing the subgame while ignoring the whole game's earlier history and later progressions.
在双人顺序博弈中,如图所示,玩家 A 在整个博弈的初始节点选择行动向上或向下。然后玩家 B 可以根据玩家 A 选择的行为在椭圆内的子博弈中选择向左或向右。然后可以像 PK 情况一样,构建由子博弈的可能结果组成的战略形式矩阵,并利用每个子博弈的极小极大值计算纳什均衡。
In a two-player sequential game, in the figure, Player A chooses an action to go up or down at the whole game initial node. Player B then can choose to go left or right in a subgame within the ellipses depending on the action chosen by Player A. Then a strategic form matrix may be constructed of probable outcomes of the subgames just as in the PK situation, and a Nash equilibrium can be calculated using the minimax for each subgame.
在 Libratus 的第二个模块中,嵌套的安全子博弈求解器根据达到子博弈纳什的价值估计来提供子博弈中的策略均衡。游戏策略蓝图模块已经估计了它的纳什均衡,因此也估计了这个值,并且对于每个子游戏,也使用这些值作为输入,因此子游戏求解器以实时达到的更细粒度的抽象进行求解;也就是说,每当对手选择不在细粒度抽象中的动作时,它就会求解一个新的子游戏,每次对手下注时,它都会有效地构建一个包含该动作的新子游戏,从而随着游戏的进展自动重复计算越来越细粒度的详细策略。3
In Libratus’ second module, a nested safe-subgame solver provides strategy in the subgame from an estimate of the value of reaching the subgame Nash equilibrium. The playing strategy blueprint module already estimated its Nash equilibrium and thus this value, and also for every subgame using these values for input, so the subgame solver solves in a finer-grained abstraction that is reached in real time; that is, it solves a new subgame every time an opponent chooses an action that is not in the fine-grained abstraction, effectively constructing a new subgame including that action every time the opponent bets, thereby automatically and repeatedly calculating more and more finely-grained detailed strategies as play progresses.3
然而,在实际游戏中,子博弈不应该完全孤立地解决,因为获胜策略可能取决于先前的子博弈和迄今为止尚未达成的游戏。如果尽管如此,仍盲目遵循蓝图,即所谓的不安全子博弈解决,对手可能会将这些模式识别为简单的孤立策略,并利用考虑整个游戏体验的更全面的策略来利用它们。
In actual playing however, a subgame should not be solved in complete isolation because winning strategies may depend on prior subgames and games hitherto not yet reached. If the blueprint is slavishly followed in spite of this, so-called unsafe-subgame solving, an opponent can recognize the patterns as simple isolated gambits and exploit them with a more comprehensive strategy considering whole games experiences.
为了抵消这一点,安全子博弈解决方案仍然将所有行动置于策略蓝图内,但使用极小极大值的更详细子博弈抽象旨在通过评估玩家在对手的最坏情况下的行动中会比仅仅遵循策略蓝图损失多少来近似最佳策略,无论对手持有什么牌,都会使对手的状况变得更糟;以这种方式反映整体考虑。
To offset this, a safe-subgame solving still places all actions within the strategy blueprint, but a more detailed subgame abstraction using minimax aims to make the opponent worse-off no matter what cards are held by approximating an optimal strategy through assessing how much more a player would lose against a worst-case action by an opponent than if he simply followed the strategy blueprint; in this way reflecting overall considerations.
Libratus 在游戏的前两轮下注中使用了密集动作抽象,在自我改进的第三模块中,先验蓝图中缺失的分支被填充,并使用对手的实际动作为这些分支计算博弈论策略来指导树搜索填充。如果对手下注的金额不在抽象中,则将赌注四舍五入为接近抽象的大小;然而,这会导致策略和达到某些子博弈的估计略有扭曲,并且必须通过向抽象中添加少量动作来减少舍入误差。
Libratus employs a dense action abstraction in the first two betting rounds of a game, in the self-improver third module, the missing branches in the a priori blueprint are filled in and a game-theoretic strategy is computed for those branches using the opponent's actual actions to guide the tree-search filling-in. If the opponent does not bet an amount that is in the abstraction, the bet is rounded off to a nearby size that is in the abstraction; this however causes a slight distortion in the strategy and estimates of reaching certain subgames, and the rounding error must be reduced by adding a small number of actions to the abstraction.
添加哪些动作取决于对手选择的最常见动作以及这些动作与抽象解决方案的距离,从而填补蓝图抽象中缺失的有效分支。一旦选择了动作,就会通过嵌套安全子博弈求解器的技术计算出这些新分支的策略。
Which actions are added depends on the most frequent actions chosen by the opponent and how far those actions were from the solution to the abstraction, thereby filling in effective missing branches in the blueprint abstraction. Once an action is selected, a strategy for those new branches is calculated by the techniques of the nested safe-subgame solver.
通过这种方式,Libratus会根据对手的实际打法,在策略蓝图中发现自己游戏中的弱点,并随着时间的推移不断增强和完善预先计算的蓝图。
In this way, Libratus augments and refines the pre-computed blueprint over time based on the weaknesses in its own game that the opponent has found in the strategy blueprint as determined by the opponent's actual play.
因此,Libratus 不仅要学习如何利用对手的玩法,还要学习如何使自己的玩法更不容易被利用。
Libratus thus is not only learning how to exploit the opponent's play, but also learning how to make its own play less exploitable.
在一项不完全信息基准 AI 挑战赛中,卡内基梅隆大学的 Libratus 在单挑、无限注德州扑克比赛中连续击败了四名职业选手。Libratus 不使用专家领域知识,其技术独立于游戏,因此可以应用于不同的对手,并应用于其他不完全信息活动,如商业、金融、政治、外交甚至战争。4
In an imperfect information benchmark AI challenge, Carnegie-Mellon's Libratus successively trounced four professional players in a heads up, no-limit Texas Hold’em poker competition. Libratus does not use expert domain knowledge and its techniques are game-independent so they can be applied to different opponents, and in other imperfect information activities such as business, finance, politics, diplomacy and even warfare.4
经过两年的进一步开发,卡内基梅隆大学的新型Pluribus不仅限于单挑扑克,它在 15 场无限注德州扑克比赛中同时与六名玩家对战,并令人信服地赢得了所有比赛。人们会认为,由于多人扑克的可能性更多,Pluribus 需要的计算能力将超过 Libratus 提供的 100 个 CPU,但 Pluribus 只需要两个 CPU 就能击败多名顶级职业扑克玩家。
After two years of further development, Carnegie-Mellon's new Pluribus, not limited to heads-up poker, took on six players simultaneously in 15 no-limit Texas Hold’em matches and convincingly won them all. One would think that with the many more possibilities in multiplayer poker, Pluribus would need more computing power than Libratus’ 100 CPUs provided, but Pluribus needed only two CPUs to defeat multiple top professional poker players.
原因是 Pluribus 借鉴了 AlphaGoZero 的策略,通过对数万亿手牌进行强化学习和自我监督。从零开始,随机下注,学习,然后在每一手训练后与自己核对哪些下注行为实际上能赢得最多的钱,从而改进自己的玩法,Pluribus 因此利用大数定律击败了最优秀的人类玩家,而这些玩家终其一生都无法积累如此多的经验。5
The reason is that Pluribus took some pages out of AlphaGoZero's playbook by reinforcement learning and self-supervision over trillions of poker hands. By starting from zero, just randomly betting, learning and then refining its play based on checking back after each training hand against itself as to which betting actions actually won the most money, Pluribus thus used the law of large numbers to defeat the best human players, who in their entire lives could never assemble that much experience.5
多人游戏更能反映现实生活中不完善知识的情况,而 Pluribus 的能力可有利地用于经济和地缘政治谈判、欺诈检测(恰如其分地源于扑克)以及自动驾驶,在这些领域中,玩家需要同时应对许多其他交通、障碍和规则“对手”。
The multiplayer game is more reflective of real-life situations of imperfect knowledge, and Pluribus’ capability can be advantageously used in economic and geopolitical negotiation, fraud detection (fittingly from its roots in poker), and autonomous driving, where one is dealing with many other traffic, obstacle, and rules “adversaries” at once.
在智力型棋盘游戏、智力竞赛节目、辩论社辩论会等取得压倒性胜利后,人工智能机器还证明,它可以在年轻人反应型电子游戏和热闹的德州扑克世界中脱颖而出,从而可以自信地进入现实世界。
After convincing victories in the urbanely intellectual board games, quiz shows, and debating society debates, the AI machine has also proved that it can excel in youthfully reflexive video games and the raucous world of Texas Hold’em poker, thus able to confidently enter the real world.
电视自然语言处理的第一个技术问题是将口语输入计算机进行分析。如果你在说话时把手放在喉咙上,你会发现它在振动,从而发出纵向声波,以周期性的喘息方式压缩空气,在清音塞音(如“p”)的情况下,可以区分话语。当声波撞击听者的耳膜时,它会导致耳膜与撞击波形同步振动,听觉神经会将与声波波形成比例的电信号发送到大脑进行语音处理。
The first technical problem in natural language processing is to get spoken words into the computer for analysis. If you put your hand on your throat when you speak, you will find that it is vibrating, and thus is sending out longitudinal sound waves, compressing the air in periodic puffs that in instances of unvoiced stop sounds (such as “p”), utterances can be differentiated. When the wave impinges a listener's ear drum, it will cause it to vibrate in step with the impinging waveform, and an auditory nerve will send electrical signals proportional to the sound waveform to the brain for speech processing.
在机器前端处理中,声波撞击连接到快速傅里叶变换(FFT) 分析仪的麦克风的振膜,并生成随时间变化的空气振动幅度图,不同的声音会产生不同形状的横波,例如右图中的亚马逊“生日快乐”问候语。不同的声音将具有更高的振幅,表现为图中所示的称为共振峰的独特峰值。1
In machine front-end processing, the sound wave impinges the diaphragm of a microphone attached to a Fast Fourier Transform (FFT) analyzer, and a graph is produced of the air vibration amplitudes over time, with different sounds producing differently-shaped transverse waves, for example Amazon's “Happy Birthday” greeting in the figure at right. Distinct sounds will have higher amplitudes manifested in distinctive peaks called formants as shown in the figure.1
根据傅里叶理论,无论波有多复杂,它都可以表示为随时间具有不同振幅a n和频率ω的正弦波和余弦波的无限和,
According to Fourier, no matter how complicated a wave is, it can be represented as an infinite sum of sine and cosine waves having different amplitudes an and frequencies ω over time,
其中a 0只是n = 0 时的系数,a 0 cos(0ωt) = a 0,不存在b 0 ,因为n = 0 时,b 0 sin(0ωt) = 0。对于n ≧1,a n是傅里叶系数,用于描述声波中每个频率的余弦和正弦的“量”,以振幅表示;它们由以下公式给出
where a0 is just the coefficient for n = 0, a0cos(0ωt) = a0, and there is no b0 because for n = 0, b0sin(0ωt) = 0. For n≧1, the an are the Fourier coefficients which describe the “amounts” of cosine and sine of each frequency in the sound wave as represented by their amplitudes; they are given by
从上图“生日快乐”的波形可以看出,一系列不同振幅和频率的正弦和余弦函数确实具有表示语音的能力。2
From the shape of the “Happy Birthday” waveform above, it can be seen that a series of sine and cosine functions with different amplitudes and frequencies indeed has the capability to represent speech.2
事实证明,元音的波形在句子中重复得非常清晰,并且由于它们在不同频率下具有独特的形状以及自然相对较大的振幅,它们具有人类所能发出的最大的声音区别,因此听众可以很容易地将它们相互区分。
It turns out that the waveforms of vowels repeat themselves very clearly in sentences and because of their distinctive shape at different frequencies and naturally relatively larger amplitude, they have the greatest sound distinctions that can be uttered by humans, so a listener can easily distinguish them one from the other.
声音的振幅作为频率的函数称为声谱,它显示共振峰的离散位置,揭示响度、音高、语调和重音的语音特征。
The amplitudes of sound as a function of frequency is called sound spectra that shows the discrete location of formants that reveal the speech characteristics of loudness, pitch, intonation, and accent.
人类的听觉在很大程度上取决于对比例的感知,这种感知在对数尺度而非线性尺度中表现得更为明显。例如,音阶中的音符音调上升,在不同的八度音阶中,它们之间的距离感觉大致相同;也就是说,从do、re、mi到下一个八度do、re、mi的距离听起来似乎是单调的,但实际上频率翻了一番。
Human hearing is critically dependent on the perception of proportion, which is more distinctly manifested by a logarithmic rather than a linear scale. For example the notes of the musical scale rise in pitch, and in different octaves the distance between them is perceived as about the same; that is, the distance from do, re, mi to the next octave do, re, mi when heard, seems to be monotonic, but it is actually doubled in frequency.
对于元音,共振峰在对数频率刻度上关于中心频率左右对称,从它们的频谱端点可以找到共振峰之间的最大距离,因此元音可以分为“长”或“短”。大多数电子语音识别系统,无论何种语言,都严重依赖元音识别作为起点。
For the vowels, the formants are left-right symmetric about a central frequency on the logarithmic scale of frequencies, and from their spectral endpoints, one can find the maximum distance between formants, and the vowels thus may be classified as uttered “long” or “short”. Most electronic speech recognition systems, regardless of language, rely heavily on vowel recognition as a starting point.
对于非周期性波形(更像自然语音中的波形),傅里叶变换的周期为无穷大,因此必须通过转换到频域来推导,因为随着时间趋于无穷大,傅里叶系数的1/T因子将站不住脚。这很容易解决,因为频率只是时间周期的倒数,因此当时间周期趋于无穷大时,波谱中的频率间隔变为零,因此作为频率函数的振幅离散声谱恰好从直方图峰谷变为平滑、连续且可微的曲线,能够表示为正弦和余弦的基本函数表示。
For non-periodic waveforms which would be more like those in natural speech, the period of the Fourier Transform is taken to infinity, and so must be derived by changing over to the frequency domain because as time goes to infinity, the Fourier coefficients’ 1/T factor is untenable. This is easily remedied since frequency is just the inverse of time period, and so as the time period approaches infinity, the frequency interval in the spectrum of the wave goes to zero, so the discrete sound spectrum of amplitudes as functions of frequency conveniently changes from the histogram peaks and valleys to become smooth, continuous and differentiable curves able to be represented as the elementary function representation of sines and cosines.
将傅里叶系数重写为复数形式,并将f(t)改为g(t),以避免与频率的f混淆,并结合傅里叶级数及其系数方程,傅里叶变换及其逆变换为,3
Rewriting the Fourier coefficients in complex form and changing the f(t) to g(t) to avoid confusion with the f for frequency, and combining the Fourier series and its coefficients equations, the Fourier transform and its inverse are,3
其中积分极限下限中的 −∞ 时间可以使过去,但负无穷大的负频率可能会让人停顿,尽管如此,上述方程还是给出了组成波振幅之间的关系,这正是模拟听觉波形所需要的。
where the −∞ time in the lower limit of integration limit enables the past, but the negative frequencies from minus-infinity may give one pause, the above equations nevertheless give the relationship among the amplitudes of the component waves, which are what is needed to model an auditory waveform.
傅里叶系数公式通过提取波的单个周期并一次一个频率地找到给定频率的该周期的面积(积分)来表示波形的分量。
The Fourier coefficient formulas represent the components of a waveform by extracting a single period of the wave and finding the area (integral) of that period for a given frequency, one frequency at a time.
应用于语音波形时,逆傅里叶变换g(t)是傅里叶变换G(f)的频率积分,必须计算该积分的面积。为此,首先可以将g(t)作为时间离散点的函数,生成随时间变化的振幅图,该图可以通过 FFT 分析仪从语音波形中读取;然后将这些振幅乘以一个正弦波,该正弦波恰好适合振幅值的时间范围,并且周期等于观察到的波振荡次数,将生成一个离散条形图,其组合面积将是与波形积分相匹配的面积。
Applied to a speech waveform, the inverse Fourier transform g(t) is the integral over frequencies of the Fourier transform G(f), which is an area that must be calculated. To do so, first g(t) can be taken as a function of discrete points in time, producing a plot of amplitude over time which can be read off of the speech waveform by the FFT analyzer; then multiplying those amplitudes by a sine wave just fitting into the range of time for which there are amplitude values, and with a period equal to the number of observed wave oscillations, will produce a discrete bar graph whose combined areas will be the area to match the integrations of the waveform.
因此,傅里叶变换将任何波形分解成其组成简单波,并使波形的整体形状可以从采样的一部分中识别出来。离散傅里叶变换是计算机进行 FFT数字计算所必需的。
The Fourier transform thus breaks up any waveform into its component simple waves, and renders the overall shape of the waveform recognizable from just a sampled portion of it. The discrete Fourier transform was necessary for FFT digital calculations by computers.
由于任何声音都可以用电子仪器以图形方式表示,因此在语音识别的早期,很自然地会使用专用电子硬件进行自上而下的声学语音识别。例如,日本的无线电研究实验室使用滤波器组频谱分析仪,将频谱分析仪的每个通道以加权方式连接到元音判定电路的逻辑电路来识别元音。俄罗斯人首先开发了对一对频率扭曲话语(音高的对数感知)进行关键的时间对齐以进行动态频率扭曲,而在美国,RCA 实验室结合使用两者来模拟语音中时间尺度的不均匀性,斯坦福大学的 Raj Reddy 率先通过动态跟踪音素(感知上不同的声音单元,用于区分一个词与另一个词)实现了连续语音识别,顺便说一句,这种技术最先用于计算机国际象棋中的合成口头动作。
With the graphical representation of any sound using electronic instruments, it was natural in the early days of speech recognition to use special purpose electronic hardware for top-down acoustic-phonetic speech recognition. For example, Japan's Radio Research Lab used a filter bank spectrum analyzer with logic connecting each channel of the spectrum analyzer in a weighted manner to a vowel-decision circuit to recognize vowels. The Russians first developed the critical time-aligning of a pair of frequency warped utterances (logarithmic perception of pitch) for dynamic frequency warping, and in America, RCA Labs modeled the non-uniformity of time scales in speech using a combination of both, and Raj Reddy at Stanford pioneered continuous speech recognition by the dynamic tracking of phonemes (perceptively distinct units of sound that distinguish one word from another), incidentally first used for synthetic spoken moves in computer chess.
在接下来的四十年里,人们积极研究孤立词和语音模式声谱、线性预测编码(LPC) 和动态规划,IBM 和 AT&T 贝尔实验室开发了用于计算机和电话的大词汇量、与说话者无关的商业语音识别系统。
Through the next forty years, the isolated word and speech pattern sound spectra, linear predictive coding (LPC), and dynamic programming were actively researched, and IBM and AT&T Bell Labs developed large-vocabulary, speaker-independent commercial speech recognition systems for use in computers and telephony.
该技术通常包括一组滤波器前端分析仪,用于首先分离非常不同的音调,例如男性和女性,并产生一组表示给定频带内声音能量的信号,从而创建话语的声谱。
The technology generally comprised a bank-of-filters front-end analyzer for first separating the very different voice pitches, such as men from women, and producing a set of signals representing the energy of a sound in a given frequency band, thereby creating the sound spectra of an utterance.
线性预测编码使用时间相关的数字滤波器模拟表示声音强度和音高的声门(声带褶皱之间的空间)脉冲、产生共振峰(独特频率峰值)的声道(喉咙和口腔)共振以及产生典型话语的嘶嘶声和砰砰声的舌头、嘴唇和喉咙的影响。
Linear predictive coding models the effects of the glottal (space between vocal cord folds) pulse representing sound intensity and pitch, the vocal tract (throat and mouth) resonances producing formants (distinctive frequency peaks), and the tongue, lips, and throat that produce the hisses and pops of a typical utterance, using time-dependent digital filters.
LPC信号源前端处理假设给定时间t的给定语音样本s(t)可以通过n 个过去时间语音样本s(t – i)乘以预测系数 a i的线性组合来近似,并通过添加增益因子G乘以归一化信号激励u(t)进行归一化,
LPC signal source front-end processing assumes that a given speech sample s(t) at a given time t can be approximated by a linear combination of n past time speech samples s(t – i) multiplied by predictor coefficients ai, normalized by adding a gain factor G multiplied by a normalized signal excitation u(t),
由于信号会随时间而变化,因此必须根据该时间前后出现的语音信号短片段来估算给定时间的预测系数,估算以每秒 0 – 50 帧的速率进行。其思路是直接从语音信号中确定一组预测系数{a k },以便通过最小化该帧的预测和语音样本之间的均方误差,使数字滤波器的频谱特性与帧内语音波形的频谱特性最佳匹配。
Since the signal will change with time, the predictor coefficients at a given time must be estimated from a short segment of the speech signal occurring around that time, the estimate being performed at a rate of 0 – 50 frames per second. The idea is to determine the set of predictor coefficients {ak} directly from the speech signal so that the spectral properties of a digital filter best match those of the speech waveform within the frame by minimizing the mean-squared error between the prediction and the speech sample for that frame.
滤波器组和 LPC 建模的结果经过源编码,将信号转换为二进制数字序列,并编码为一系列表示语音信号随时间变化的频谱特征的矢量。这种所谓的矢量量化将输入矢量编码为整数索引,该索引可以通过最小化频谱失真与再现矢量的码本相关联,然后可以将其用作语音识别人工神经网络的识别预处理器和/或训练数据集。4
The results of the filter-bank and LPC modeling are source-coded to convert the signals into a sequence of binary digits, and encoded in a series of vectors representing the time-varying spectral characteristics of the speech signal. This so-called vector quantization encodes an input vector into an integer index that can be associated through minimizing spectral distortion with a codebook of reproduction vectors that then can be used as a recognition preprocessor and/or training dataset for a speech recognition artificial neural network.4
在声学语音前端识别器中,输入帧与一组参考特征匹配。紧凑度、重力、重音和平直度等频谱特征可用作参考,对元音进行分类,并根据共振峰幅度、频谱带能量和持续时间等声学参数的阈值来判断这些特征的存在与否。然后,可以使用元音决策树依次测试语音的每个命题。
In the acoustic-phonetic front-end recognizer, an input frame is matched to a reference set of features. Spectral features of compactness, gravity, stress, and flatness can be used as reference to classify vowels with the decision as to the presence or absence of such features based on threshold values of acoustic parameters such as formant amplitudes, spectral band energy, and time duration. A vowel decision tree then can be employed to sequentially test each proposition of the speech.
使用滤波器组或离散傅里叶变换的声学-语音建模进行语音分割、标记以及元音和声音分类可以提取特征并在很大程度上识别单个单词和整个句子,但是由于串联口语的变化以及缺乏改进识别的调整机制,声学-语音方法本身无法产生可推广的自动语音识别系统。
Acoustic-phonetic modeling using filter banks or discrete Fourier transforms for speech segmentation, labeling, and vowel and sound classification could extract features and largely identify individual words and some whole sentences, however because of the vagaries of strung-together spoken language, and the lack of a tuning mechanism to improve the recognition, the acoustic-phonetic approach by itself could not produce a generalizable automatic speech recognition system.
赫随着光学字符识别和声学语音电子学的进步,高质量的文本和语音识别都在不断改进。然而,识别单词是一回事,识别单词又是另一回事,文本和语音识别,更不用说自然语言处理,都存在同样的语义问题。
High quality text and speech identification are both improving with advances in optical character recognition and acoustic-phonetic electronics. However, the identification of words is one thing, the recognition of words is quite another, and both text and speech recognition, to say nothing of natural language processing, have the same semantics problems.
1949年,当美国数学家沃尔特·韦弗(Walter Weaver)提议用计算机进行文本翻译时,这个想法看似很简单,给定一个语言的句子,计算机会根据拼写识别出一个单词,在存储在内存中的双语词典中查找这个单词,并匹配第二种语言中的对应单词;然后计算机的逻辑会根据第二语言的语法规则排列翻译后的单词。
In 1949 when the American mathematician Walter Weaver proposed a computer for text translation, the idea seemed simple enough, given a sentence in one language, a computer would recognize a word by its spelling, look up the word in a bilingual dictionary stored in memory and match the corresponding word in the second language; then the computer's logic would arrange the translated words according to the rules of grammar of that second language.
20 世纪 50 年代冷战期间,俄罗斯和美国都渴望使用机器翻译(MT) 快速翻译对方的文件。然而,自然语言充满了歧义和推理,早期的机器翻译尝试产生了荒谬的翻译,例如“心有余而力不足”这句话被翻译成俄语,然后又被翻译回英语,结果变成了“伏特加很好,但肉却烂了”,这表明虽然某些单词可以被识别和翻译,但它们的含义却完全是另一回事。
During the Cold War of the 1950s, both Russians and Americans were eager to use machine translation (MT) to quickly translate each other's documents. Natural language, however, is fraught with ambiguity and inference, and the early MT attempts produced ridiculous translations such as the saying “the spirit is willing but the flesh is weak” being translated into Russian and then back into English as “the vodka is good but the meat is rotten”, demonstrating that although particular words could be recognized and translated, their meaning was quite another matter.
由于缺乏进展,加上机器翻译(例如将俄语的“液压冲压机”翻译成“水山羊”),引发了人们的嘲笑,并最终在 1966 年引发了第一个“人工智能寒冬”,当时美国国家研究委员会取消了对自动机器翻译的所有研究支持。
The lack of progress together with machine translations such as “water goat” for the Russian “hydraulic ram” brought on ridicule culminating in the first “AI Winter” of 1966 when the American National Research Council canceled all research support for automatic machine translation.
任何语言翻译都要处理谚语、用法、习语、白话、俚语、暗示、影射、措辞、双关语、缩写、首字母缩略词以及同一词在不同语境中的多重含义,因此不可避免地充满不确定性和歧义。
Any language translation must contend with sayings, usage, idioms, vernacular, slang, implication, innuendo, turns of phrases, puns, abbreviations, acronyms, and multiple meanings of the same word in different contexts, and is therefore unavoidably fraught with uncertainties and ambiguity.
对于语音识别来说,除了不确定性和模糊性之外,不同说话者的口音、发音、清晰度、粗糙度、鼻音、音调、语调、速度、时间、情感、幽默、讽刺等等也都各不相同,所有这些都使得自上而下的文本和语音机器翻译几乎不可能实现。
For speech recognition, add to that uncertainty and ambiguity different speaker's varieties of accent, pronunciation, articulation, roughness, nasality, pitch, inflection, speed, timing, emotion, humor, sarcasm, and so on, all of which renders accurate top-down machine translation of text and speech almost literally impossible.
诺姆·乔姆斯基的语言习得装置(LAD) 确实认识到需要基于儿童自然习得的语言知识而不是语言自上而下的僵化句法和基于规则的语法来建立认知模型。这种自下而上的方法在理论上是可行的,但相关的认知数据很少,而且 20 世纪 60 年代的计算能力有限,因此这位激烈的反战活动家的 LAD 被嘲笑为又一个有趣的自动语音识别失败。
Noam Chomsky's Language Acquisition Device (LAD) indeed recognized the need for cognitively modeled language based on a child's naturally-learned knowledge of speech rather than on a language's top-down rigid syntax and rules-based grammar. This bottom-up approach was theoretically sound but relevant cognitive data was sparse and the computational power of the 1960s limited, so the vehement anti-war activist's LAD was ridiculed as just another amusing automatic speech recognition failure.
英语有大约 1300 万个单词,因此对于语义学家/数学家来说,给定单词x有一个意义函数f(x);首先由于不同单词的数量庞大,其次由于上述含义的不确定性和模糊性,f(x)不可避免地依赖于上下文。
The English language has some 13 million words, so to a semanticist/mathematician, a meaning function f(x) for a given word x; first because of the sheer number of different words, and secondly because of their uncertainties and ambiguities of meaning described above, f(x) is irredeemably dependent on context.
由于我们无法减少单词的数量(而单词的数量每天都在增加),因此可以先将单词分组,例如,按同义词(意思相同但单词不同)和同形异义词(同一个单词有多种发音)形成一个关联组。但同一个单词在上下文中可以有不同的含义,例如同音异义词(发音和拼写相同但含义不同)、同音异义词(发音相同但含义不同)和异义词(拼写相同但发音和含义不同)。
Since we cannot reduce the number of words (which are increasing daily), the words can be first grouped, for example, as to synonyms (same meaning but different words) and homographs (same word pronounced in more than one way) to form an associated group. But the same word can have different meanings in context, such as homonyms (same sound and spelling but different meaning), homophones (same sound but different meaning), and heteronyms (same spelling but different sound and meaning).
然而,单词可以根据上述内容产生关联,并且可以使用向量对相关单词进行分组,从而减少变量的数量以形成减少的词向量空间。
Words however can have associations based on the above, and vectors could be employed to group associated words and so decrease the number of variables to form a reduced word vector space.
此外,向量除了能够关联表面上不同的元素之外,还可以指定方向,从而通过内积量化对象之间的分离,例如,同义词的内积角度因子可以接近1(词向量之间的余弦 = 0),这意味着意义上的精确融合。此外,由x、y和z表示的三个词可以通过它们不变的距离d相关联,这来自三维的勾股定理x 2 + y 2 + z 2 = d 2,并可扩展到无限维。因此,可以根据单词与其他单词的“接近程度”进一步对其进行分类。因此,根据语言学家 JR Firth 的格言,
Moreover, vectors, in addition to being able to associate ostensibly disparate elements, can also specify direction and thereby quantify separation between objects by their inner products, so for example, synonyms could have an inner product angular factor close to 1 (cosine = 0 between word vectors), meaning exact confluence in terms of meaning. Furthermore, three words represented by x, y, and z could be related by their invariable distance d, from the Pythagorean theorem x2 + y2 + z 2 = d 2 for three dimensions and extendible to infinite dimensions. Thus the words can be further classified as to their “closeness” to other words. So in accord with the adage attributed to the linguist J.R. Firth,
你会从它所结交的朋友中知道一个词,
You shall know a word by the company it keeps,
将上下文融入单词理解中的真理;也就是说,给定单词的接近性有助于确定其含义。
a truism that inculcates context into the understanding of a word; that is, the closeness of given words helps to ascertain their meaning.
在自动语音识别(ASR)中使用的所谓Word2Vec模型中,一个句子被解构为一个多维单词向量空间,这些单词的位置使得在不同上下文中具有共同含义的单词在向量内积或毕达哥拉斯距离的不变数学标量距离意义上更接近。
In the so-called Word2Vec models used in automatic speech recognition (ASR), a sentence is deconstructed into a multidimensional vector space of words that are positioned such that words sharing common meaning in different contexts are closer together in the invariant mathematical scalar distance sense of a vector inner product or Pythagorean distance.
Skip -gram架构使用中心词来预测周围上下文词窗口中的上下文,为距离较近的上下文词赋予更大的权重,从而帮助确定单词的含义。相反,连续词袋(CBOW) 通过将窗口中的单词向量相加来预测周围单词窗口中哪些单词最有可能相关。1
A skip-gram architecture uses a center word to predict the context in a surrounding window of context words, giving heavier weights to less distant context words, and thereby helping to fix the word's meaning. A continuous bag of words (CBOW) conversely predicts which words from a window of surrounding words are most probably relevant by summing the vectors of the words in the window.1
内积相关性可用于使用卷积人工神经网络模式识别对输入帧向量和参考元音特征向量进行分类。输入语音是从前端频谱分析仪获得的频谱向量的时间序列,以形成测试模式 T,作为语音持续时间t i上的频谱帧向量的串联,
An inner product correlation can be used to classify the input frame vector and a reference vowel feature vector using convolutional artificial neural network pattern recognition. The input speech is in the form of a time sequence of spectral vectors obtained from the front-end spectral analyzers to form a test pattern T as a concatenation of the spectral frame vectors over the duration of the speech ti,
将测试模式T与一组参考模式{R j }(包含一系列频谱帧R j )进行比较,
The test pattern T is compared with a set of reference patterns {Rj} comprising a sequence of spectral frames Rj,
然后最小化T与每个R j的距离将输入语音模式与参考模板关联起来,并且可以使用频谱失真测量技术分析地执行两个模式的全局时间对准。
Then minimizing the distance of T from each of the Rj will associate the input speech pattern with the reference template, and the global time alignment of the two patterns can be performed analytically using spectral distortion measurement techniques.
一种频谱失真测量方法是基于频率扭曲,即人类对音调的非线性对数感知,以接近人类听觉系统的频率分辨率对宽带频谱进行建模。2
One spectral distortion measure is based on frequency warping, the human non-linear, logarithmic perception of pitch, to model a wide-band spectrum with a frequency resolution close to that of the human auditory system.2
用于训练的参考模板可以是非刚性模板或统计模型。模板用于自动语音识别器,但即使经过训练,其适应不同说话者、说话风格、背景声学和电子噪声的能力仍然有限,因此它们通常用于非常具体的语音识别任务,例如识别对自动电话应答请求的响应,或用作识别预处理器,以减轻连接的模式识别人工神经网络的计算负担。
Reference templates for training can be in the form of a non-rigid template or a statistical model. Templates are used in automatic speech recognizers, but even after undergoing training, their ability to adapt to different speakers, speaking styles, background acoustics, and electronic noise is limited, so they are typically used for very specific speech recognition tasks, such as recognition of a response to automatic telephoned answer requests, or as recognition preprocessors that can reduce the computational burden of a connected pattern recognition artificial neural network.
动态规划将复杂的问题(例如长句)分解为子问题,求解每个子问题并根据输入参数对问题解决方案进行索引,并将解决方案存储在矩阵中。这样,当遇到相同的语音问题时,可以通过索引查找解决方案矩阵,而不必再次求解该问题,从而提高计算速度和效率。
Dynamic programming breaks down a complex problem, such as a long spoken sentence, into sub-problems, solving each of them and indexing the problem solutions based on their input parameters, and storing the solutions in a matrix. Thus when the same speech problem is encountered, the solution matrix then can be looked up by means of its index, and the problem will not have to be solved again, thereby increasing computation speed and efficiency.
动态规划已广泛应用于运筹学中解决顺序决策问题,因此它可以有利地用于语音识别,通过时间对齐和规范化来解释过去的说话变化。
Dynamic programming has been widely used in operations research to solve sequential decision problems, and so it can be advantageously used in speech recognition to account for past variations in speaking by using time alignment and normalization.
总结一下,前端频谱分析按顺序测量短时语音参数,产生一系列频谱特征向量。然后,将该语音模式输入与参考模式、模板或统计模型进行比较,并使用动态规划计算短时和全局频谱失真(差异)。下一步是从认知意义上将话语视为一个整体;也就是说,自然语言声学建模将声音信号编码为一系列语音特征向量,其频率不是按线性比例缩放,而是按对数比例缩放(扭曲),以更好地模拟人类的听觉感知,其对对数比例而不是线性比例的反应更为敏锐。
Summarizing, a front-end spectral analysis measures short-time speech parameters sequentially, producing a sequence of spectral feature vectors. This speech pattern input is then compared with a reference pattern, templates or statistical models, and short-time and global spectral distortion (dissimilarities) are calculated using dynamic programming. A further step is to treat an utterance as a whole in a cognitive sense; that is, natural language acoustic modeling encodes the sound signal as a sequence of speech feature vectors whose frequencies instead of being scaled linearly, are logarithmically scaled (warped) to better model a human auditory perception that responds more acutely to logarithmic rather than linear proportions.
这种所谓的感知扭曲频率通过使用相邻帧的平滑差异计算的一阶和二阶时间导数来增强,以捕捉语音识别中的显著时间影响。
This so-called perceptually warped frequency is augmented with the first and second time derivatives computed using smoothed differences of neighboring frames to capture the significant temporal influences in speech recognition.
这些主要使用 ANN 进行模式比较的机电结构可以为小词汇量和有限的语音变化提供 ASR,而更一般的语音需要自下而上的深度人工神经网络学习。
These primarily electromechanical constructs using ANNs only for pattern comparisons can provide ASR for small vocabularies and limited speech variation, more general speech requires bottom-up deep artificial neural network learning.
传统的前馈人工神经网络中,监督学习之后是强化学习,而深度信念网络(DBN)则是先从无监督训练中学习到特定特征的概率(“信念”),然后在监督训练的过程中,运用这些特征检测器对训练集数据进行分类,从而从“信念”特征开始学习模式识别。
In a conventional feedforward artificial neural network, supervised learning is followed by reinforcement learning, but a Deep Belief Network (DBN) first learns the probabilities of specific features from unsupervised training (the “belief”), and then while it undergoes supervised training, the DBN applies these feature detectors to classify the training set data, thereby learning pattern recognition from a head-start of “believed” features.
因此,语音识别深度信念网络可以首先充当前馈网络,指定特定特征神经元的激活水平,然后通过运行网络反馈,根据学习到的语音生成输入数据的其他特征。通过这种方式,它非常适合识别自然语言语音的变化,因为这个过程类似于孩子在家里通过听父母的讲话自然地在无监督学习中学习一些基本单词,然后去小学进行正式的词汇和语法监督学习;也就是说,在监督学习之前,孩子对某些单词和短语的含义以及如何表达它们有一些先验的信念。
A speech recognition deep belief network thus can first act as a feedforward network specifying the activation levels of the feature-specific neurons, and then by running the network feed-backwards, generate other features of the input data based on the learned speech. In this way, it is ideal for recognizing the vagaries of natural language speech because the process is similar to a child naturally learning some basic words in unsupervised learning at home by listening to their parents’ speech, and then going to elementary school for formal supervised learning of vocabulary and grammar; that is, before supervised learning, the child has some prior beliefs about the meaning of certain words and phrases and how to express them.
因此,DBN 就像一个受限玻尔兹曼机 (RBM),它根据概率来揭示语音中的潜在因素,这将有助于它从上下文、用法和口头交流的所有其他变化中识别各个语音特征的隐含含义,然后通过梯度下降反向传播,可以协作过滤单词和短语的基本事实,以包含自然语音中常见的推论,从而改进自然语言处理(NLP)。
A DBN is therefore like a restricted Boltzmann machine (RBM) acting on probabilities to reveal latent factors in speech that will help it to recognize the implicit meaning of individual speech features from context, usage, and all the other vagaries of spoken communication, and then by gradient-descent backpropagation, the ground truth of the words and phrases can be collaboratively filtered to include the inferences so common to natural speech, and thus improve natural language processing (NLP).
NLP 采用两种主要的统计分类模型,即判别模型和生成模型。判别模型根据可观察变量X和给定可观察变量x的目标Y的条件概率,估计给定观察值的标签,
NLP employs two main statistical classification models, the discriminative and the generative. The discriminative model estimates a label given an observation based on the conditional probability of an observable variable X and a target Y given an observable x,
例子有决策树、神经网络、逻辑回归、交叉熵成本函数、受限玻尔兹曼机和支持向量机,这些都之前讨论过。
Examples are Decision Trees, Neural Networks, Logistic Regression, Cross-Entropy Cost Function, Restricted Boltzmann Machine, and Support Vector Machines, all discussed previously.
生成模型估计联合概率分布(用“ x ”表示),并据此计算条件概率,
The generative model estimates a joint probability distribution (signified by “x”), and computes the conditional probability therefrom,
例如高斯混合模式(GMM)、隐马尔可夫模型(HMM)、受限玻尔兹曼机(RBM)和生成对抗网络(GAN),下面将依次讨论。
Examples are the Gaussian Mixture Mode (GMM), Hidden Markov Model (HMM), Restricted Boltzmann Machine (RBM), and Generative Adversarial Network (GAN) to be discussed in turn below.
1913 年,俄国数学家安德烈·马尔可夫从书架上取下亚历山大·普希金的诗体小说《叶甫盖尼·奥涅金》,不是为了阅读,而是为了解构。他仔细地写出其中的前 20,000 个字母,并将它们排列成 20×20 的矩阵,数出元音,并一丝不苟地寻找可以建模的诗歌数学结构的模式。
In 1913, the Russian mathematician Andrei Markov took down his bookshelf copy of Alexander Pushkin's verse novel Eugene Onegin, not to read but to deconstruct, carefully writing out the first 20,000 letters and arraying them in 20 × 20 matrices, counting the vowels, and meticulously looking for patterns revealing a mathematical structure of verse that might be modeled.
马尔可夫认为,与抛硬币等纯随机事件不同,单词序列中的字母取决于因果关系链中的先前结果。也就是说,在《叶甫盖尼·奥涅金》中,某个字母按顺序出现的概率取决于它之前的字母,事实上,他的样本包含 43% 的元音和 57% 的辅音,分布为 1,104 个元音-元音对、3,827 个辅音-辅音对和 15,069 个元音-辅音和辅音-元音对;因此(肯定)《叶甫盖尼·奥涅金》不是字母的随机分布,但可能具有可以用数学建模的(隐藏)统计特征。3
Markov believed that unlike the purely stochastic occurrences in say coin tossing, the letters in a sequence of words depend on prior outcomes in a chain of causation. That is, in Eugene Onegin, the chance that a certain letter appears in sequence depends on the letter that came before it, and indeed his sample contained 43% vowels and 57% consonants, distributed as 1,104 vowel-vowel pairs, 3,827 consonant-consonant pairs, and 15,069 vowel-consonant and consonant-vowel pairs; therefore (certainly) Eugene Onegin was not a random distribution of letters, but might have a (hidden) statistical character that could be mathematically modeled.3
现代隐马尔可夫模型(HMM)采用高斯混合模型(GMM)的马尔可夫链,对前端机电分析产生的声学矢量序列生成概率分布,GMM中的各个高斯(混合)生成语音识别所需的多个维度的变量,形成矩阵中的矢量分布。
In the modern Hidden Markov Model (HMM), a Markov chain using Gaussian Mixture models (GMMs) is employed to generate probability distributions for the acoustic vector sequences produced by front-end electromechanical analysis. The individual Gaussians (the mixture) in the GMM generate the variables in the multiple dimensions required by speech recognition to form the distribution of vectors in a matrix.
然而,上下文相关模型显然需要大量的语音数据才能准确,因此为了更有效地利用数据,HMM 被划分为每个三音素(三个音素的序列)的子 HMM 和通过 alpha-beta 剪枝聚类的 HMM 决策树,以关联不同的语音状态。
However, context-dependent models clearly require a great deal of speech data to be accurate, so for more efficient use of the data, the HMM is divided into sub-HMMs for each triphone (sequence of three phonemes) and HMM decision-trees clustered by alpha-beta pruning to associate different speech states.
在人类语音中,话语序列旨在表示孤立的单词或音素,但语音识别器无法事先知道这些单词的含义;也就是说,实际含义隐藏在语音识别器无法识别的状态中,但由于话语是可以听到的,因此存在可观察对象(可听对象),可根据出现的概率推断出单词的预期含义。
In human speech, sequences of utterances are meant to represent isolated words or phonemes, but a speech recognizer does not know a priori what the words are meant to mean; that is, the actual meanings are embedded in states hidden from the speech recognizer, but since the utterance is heard, there are observables (hearables?) from which the intended meaning of the words may be inferred based on the probabilities of occurrence.
以最简单的双词情况为例,推断是根据词1 ( w 1 ) 后面跟着词2 ( w 2 ) 或词1的概率,以及词2后面跟着词1或词2 的概率进行的,这些概率来自语法约束、句法、用法、上下文、连续语音数据等。然后,可以用方差值表示单词的接近度。这种双状态系统的状态转换概率是一个2 × 2矩阵,例如,概率以方差值表示,如下所示,
In the simplest two-word case for an example, the inferences are made from the probability that word1 (w1) is followed by word2 (w2) or by word1, and word2 is followed by word1 or word2, with the probabilities derived from grammatical constraints, syntax, usage, context, continuous speech data, and so on. Then the closeness of the words can be represented by variance values. A state transition probability for this two-state system is a 2 × 2 matrix, for example with the probabilities expressed as variance values as shown,
使用 HMM,可以通过将单词1位于第一个位置的概率与单词2具有 (1-单词1 )的概率相加来选择序列中每个点上概率最高的单词。对序列中的每个元素重复此操作,可得出序列中每个元素的概率P(单词1 ) 和P(单词2 ),从而获得话语中两个单词的最可能序列。4
Using HMM, the highest probability word at each point of the sequence can be chosen by summing the probabilities that word1 is in the first position with word2 having a probability of (1- word1). Repeating this for each element in the sequence gives the probabilities P(word1) and P(word2) for each element in the sequence, thus obtaining the most probable sequence for the two words in the utterance.4
推广到多个单词、音素、句子、短语等将大大增加状态转移概率矩阵的维数并增加隐马尔可夫模型的复杂性,但总体思路如上所述。
Generalizing to multiple words, phonemes, sentences, phrases and so on will greatly increase the dimensions of the state transition probability matrices and add complexity to the hidden Markov model, but the general idea is as described above.
当矩阵主对角线中的每个元素都是其他元素之一的方差时,这意味着单词在某种程度上非常接近,如上所述(“方差”为“接近度”),矩阵是对角协变的,因此这样的矩阵表示语音关系,并用于产生将自然语言语音的上下文和时间相关方面关联起来所需的联合概率分布。5
When each element in the principal diagonal of the matrix is a variance of one of the other elements, meaning that words are very close to each other in some manner such as described above (“variance” as “closeness”), the matrix is diagonally covariant, and such matrices thus represent speech relationships and are used to produce the joint probability distributions needed to associate the context- and time-dependent aspects of natural language speech.5
一个简单的隐马尔可夫模型具有状态转移概率A、隐藏状态序列X i、观测概率矩阵B和观测序列O i,总序列时间为T,如下图所示。
A simple hidden Markov model has state transition probabilities A, hidden state sequences Xi, an observation probabilities matrix B, and the observation sequences Oi, for the total sequence time T shown in the schematic figure below.
由于可能的可观测量及其序列比标记的训练数据多得多,因此需要为每个时间步计算一个概率分布,以便将输入语音序列与训练数据对齐。状态转移矩阵A具有概率元素a ij,其中
Since there are many more possible observables and their sequences than labeled training data, a probability distribution is calculated for each time step for the alignment of the input speech sequence with the training data. The state transition matrix A has probability elements aij where
观测概率矩阵B具有元素b j (k),其中
and the observation probabilities matrix B has elements bj(k) where
HMM 表示为λ = (A, B, π),其中π是初始状态分布。对于简单的四状态隐藏状态序列X = (x 0 , x 1 , x 2 , x 3 )观测值为O = (O 0 , O 1 , O 2 , O 3 ),其中标量值(例如元音(0)、嘶嘶声(1)和爆音(2))在四状态序列中观察到为 (0, 1, 0, 2)。
An HMM is represented by λ = (A, B, π) where π is the initial state distribution. For a simple four state hidden state sequence X = (x0, x1, x2, x3) with observations O = (O0, O1, O2, O3), with scalar values for example a vowel (0), a hiss (1), and a pop (2) observed in the four-state sequence as (0, 1, 0, 2).
状态序列X的概率是,
The probability of state sequence X is,
这样就可以计算出任意一组话语序列的概率,对于每一个状态序列,比如给定的观察序列 (0, 1, 0, 2),然后利用动态规划,概率最高的状态序列就是最佳的词序列选择。
The probability of any sequence of utterances can be thus calculated, and for each state sequence, for example the given observation sequence (0, 1, 0, 2). Then using dynamic programming, the state sequence with highest probability will be the best choice of word sequence.
最基本的概率分布只是一种随机的贝叶斯分布,而贝叶斯公式则是根据进一步的数据来计算概率。
The most basic probability distribution is just a random Bayesian distribution, and the Bayes’ formula is used for the probability based on further data.
与所有统计概率模型一样,该理论在数学上可能很复杂,但原理相对容易理解,计算可以通过在线软件包有效地处理。6
As in all statistical probabilistic models, the theory can be mathematically dense, but the principles are relatively easy to understand and the computations can be handled efficiently by online software packages.6
高斯混合模型 (GMM) 将数据点分组为呈高斯(正态)分布的簇。GMM 已用于对声波的频谱表示进行建模,并可以对数据组进行分类以表示短语和句子。因子分析将每个数据点表示为数据中潜在推断的加权线性函数,从而为自动语音识别引入复杂的细微差别。
The Gaussian mixture models (GMMs) group data points into clusters within which they are Gaussian (normally) distributed. GMMs have been employed to model the spectral representation of a sound wave, and can classify groups of data for the representation of phrases and sentences. Factor analysis represents each data point as a weighted linear function of latent inferences in the data, thereby introducing sophisticated nuance into automatic speech recognition.
语音识别通常严重依赖于时间和序列,因此 HMM GMM 词概率通常被前馈到所谓的循环神经网络(RNN),这是一种转换器,其激活依赖于时间,采用联结主义时间分类(CTC),可用于使用长短期记忆网络(LSTM) 按时间和序列训练 RNN。整个 ASR 系统可以通过强化和自我进一步完善通过自我对抗进行监督学习,实现后验参数优化,从而实现更高的准确性。
Speech recognition in general is critically dependent on time and sequence, so the HMM GMM word probabilities are typically forward-fed to a so-called Recurrent Neural Network (RNN), a transformer whose activations are time dependent employing Connectionist Temporal Classification (CTC) that can be used to train an RNN on time and sequence employing Long Short-Term Memory Networks (LSTM). The total ASR system can be further refined by reinforcement and self-supervised learning by running against itself for self-improvement for a posteriori parameter optimization producing greater accuracy.
在典型的前馈神经网络中,单个输入层完全决定了其他神经元层的静态突触激活模式。在循环神经网络(RNN) 中,可以控制神经元仅在有限的时间内激发,因此突触模式中后续神经元的激活将受到影响,并且这种影响可以传递到后续神经元突触模式,从而使 RNN 具有时间能力。给定神经元甚至可以响应其自身的早期激活来建立关联,从而形成时间控制的激活级联,该级联可以根据基于时间的先前激活模式表现出语音模式。通过这种方式可以看出,语音的所有重要时间都可以由定时神经激发模式来表示。
In typical feedforward neural networks, a single input layer completely determines a static synaptic activation pattern throughout the other neuron layers. In a recurrent neural network (RNN), the neurons can be controlled to only fire for a limited time duration, so the activations of succeeding neurons in the synaptic pattern will be influenced and such influence can be carried on to succeeding neuron synaptic patterns, giving the RNN a temporal capability. A given neuron may even respond to its own earlier activation to connect an association, thus forming a temporally-controlled cascade of activation that can manifest a speech pattern based on time-based preceding patterns of activation. In this way it can be seen that the all-important timing of speech can be represented by timed neural firing patterns.
除了与特定语音识别实现相关的任何手工设计的关联(例如银行电话语音中的相近词“checking”和“balance”)之外,递归通常由特征向量的内积来执行,以量化语音特征向量之间的标量相关性。
Apart from any hand-engineered associations germane to the particular speech recognition implementation, for example the close words “checking” and “balance” in bank telephony speech, the recurrence is typically performed by inner products of feature vectors that quantize the scalar correlation between the speech feature vectors.
因此,循环神经网络可以根据先前存在的、可以解释相关推理的相近特征向量,或者先前不存在表示缺乏关系的信号,来对刺激做出反应,从而提供后续词特征出现的更准确概率。
The recurrent neural network thus can respond to stimulations depending on the prior presence of close feature vectors that account for related inferences, or the prior absence of signals to indicate lack of relation, thereby providing more accurate probabilities of later word feature occurrence.
也就是说,RNN 可以根据特征向量的接近度将相关的先前信息与当前含义联系起来。例如,文本或语音中前面的短语“我在法国长大……”可能暗示后面出现的短语“我能说一口流利的_____”会重复出现,其中前面出现的单词“法国”会产生很高的推理概率(由单词特征向量的接近度体现),即空白处应该是单词“法语”,即使空白处的单词是乱码,并且中间有很多单词和停顿。因此,即使文本中的单词拼写错误、书写不清楚或语音含糊不清,RNN 也能够通过单词特征向量的接近度通过推理关联前面的语音来提供一个单词。
That is, RNNs can connect relatable previous information to present meaning based on the closeness of the feature vectors. For example, the phrase, “I grew up in France … earlier in the text or speech can imply a recurrence with the later occurring phrase “I speak fluent _____”, where the earlier occurring word “France” generates a high inferential probability (manifested by the closeness of word feature vectors) that the blank should be the word “French”, even if the word in the blank space is garbled and there are many words and pauses in-between. The RNN thus has the ability to provide a word through inferential association of earlier speech by the closeness of word feature vectors even if the word in text is misspelled, unclearly written, or mumbled in speech.
另一方面,如果后面出现的短语是“我也能说流利的_____”,那么就不应该产生关联,因为“也”这个词暗示了另一种语言,而且前面的单词“法国”在这种情况下可以被停用,而其他国家的名称可以被激活,可能是通过前面的出现。
On the other hand, if the later occurring phrase is “I also speak fluent _____”, then the association should not be made because of the word “also” implying another language and the prior word “France” can be deactivated in this instance, while other countries’ names can be activated, possibly through prior occurrence.
另一个例子是上扬语调(句子末尾的音调和语调升高),其中的陈述可能会被误认为是问题,并且可以通过之前类似的上扬语调发生的情况来确定。通过这种方式,RNN 可以根据所讨论的语音上下文和说话者特定的语音语调生成单词分数矩阵。
Another example is uptalk (voice lifting in pitch and inflection at the end of a sentence) where a statement may be mistaken for a question, and can be determined by earlier instances of similar uptalk occurrence. In this way, the RNN can produce a word scores matrix in accord with the context of the speech at issue and a speaker's particular speech intonation.
如果一个主题的第一部分出现在语音的开头,而最后一部分出现在语音的结尾附近,并且相关短语之间存在相当大的跨度,则 RNN 的顺序激活可以将第一部分置于存储状态,该状态可以作为具有时间延迟和反馈回路能力的受控状态受 RNN 的控制,以便在最后一部分出现时根据需要重新激活。
If the first part of a subject occurred at the beginning of the speech, and the last part near the end, and there is a considerable span between the related phrases, the RNN's sequential activation can place the first part in a stored state, which can be under the control of the RNN as a controlled state with time delay and feedback loop capability for reactivation as needed in the event of the last part appearance.
如果较早出现的特征在之后被更广泛地引用,那么存储的控制状态可能会被完全记录在另一个包含时间延迟的网络或数据图中,并可以反馈给RNN。7
If the earlier occurring feature is more extensively referred to later, then the stored controlled state may be recorded entirely in another network or data graph that incorporates time delays and can be fed back to the RNN.7
循环神经网络可以使用所有深度神经网络卷积、正则化和其他前馈技术来更准确地执行语音和文本识别。这些技术对于任何需要记忆过去事件、想法和特征以进行实时处理的任务都特别有用。马塞尔·普鲁斯特的《追忆似水年华》是一个引人注目的文学范例,人类 RNN 以自然、文学、音乐、艺术、情感、心理学、社会、礼仪、妙语连珠等为典故,详细描述了他对过去事件的印象。
Recurrent neural networks can use all the deep neural network convolutions, regularizations, and other feedforward techniques to more accurately perform speech and text recognition. These techniques are particularly useful for any task where memory of past events, thoughts, and features are significant for real-time processing. Marcel Proust's Remembrances of Things Past is a striking literary example of a human RNN describing his impressions of past events to the minutest detail with allusions to Nature, literature, music, art, emotions, psychology, society, etiquette, repartee and so on ad infinitum.
由于循环神经网络反向传播不仅通过事件层进行,还通过时间层进行,因此梯度消失和不稳定的问题可能导致学习速度变慢,有时甚至是零学习。
Since recurrent neural network backpropagation is performed not only through event layers, but also through temporal layers, the problems of vanishing and unstable gradient descents can result in slower and sometimes even null learning.
这个问题可以通过使用门控状态或门控记忆(统称门控循环单元(GRU))来限制反向传播,从而控制梯度下降不稳定性,这种控制方式在名称不协调的长短期记忆网络(LSTM) 中实现。这个名字只有在自动语音识别 (ASR) 门控循环单元机制的背景下才有意义。
This problem is addressed by controlling the gradient descent instability by limiting the backpropagation through the use of gated states or gated memory, conjunctively termed gated recurrent units (GRUs), in an incongruously-named Long Short-Term Memory Network (LSTM). This name only makes sense in the context of the automatic speech recognition (ASR) gated recurrent unit regime.
缩写为 ASR LSTM GRU 可以通过以下方式添加或阻止信息例如,在一个 S 型层中,通过三个 S 型函数门,让信息从 0 到 1 灰度化地通过该层。因此,如果一个较早的单词的出现有助于预测当前单词或未来单词,那么它的激活将通过“ON”(1)作为长期开放门控循环单词传递,而其他很久以前的单词没有帮助(意味着词向量特征与当前语音不接近),在短期语音处理中,GRU 将逻辑门控(阻止或遗忘)为“OFF”(0)状态
The acronym-laden ASR LSTM GRU can add or block information by means of for instance three sigmoid function gates in a sigmoid layer that lets information greyscaled from 0 to 1 through the layer. So if an earlier word whose occurrence is helpful in predicting a current word or future word, its activation will be passed on by an “ON” (1) as a long-term open-gated recurrent word, while other long-ago words that are not helpful (meaning word vector features are not close to the present speech), in the short-term speech processing will be logic gated (blocked or forgotten) by the GRU as an “OFF” (0) state
这解释了“长”指的是距离或时间较长(即不是“近”的词向量),而“短期”则是指对所讨论的单词或短语的含义进行这种特定的短实时预测所需的时间。因此,LSTM RNN 可以学习长期依赖关系,这对于语音识别、印刷或手写文本识别以及语音合成的迫切需求非常重要。
This explains the “long” as a long distance or time away (that is, not a “close” word vector), and “short-term” as just needed for this particular short real-time prediction of the meaning of the word or phrase in question. LSTM RNNs therefore can learn the long-term dependencies important for the immediate needs of speech recognition, printed or handwritten text recognition, and speech synthesis.
连接时间分类(CTC) 通常用于训练循环神经网络,该网络采用 LSTM 来执行时间可变的序列,并采用强化学习,这是自然语言处理所必需的。
A Connectionist Temporal Classification (CTC) is typically used for training recurrent neural networks employing LSTM to do sequences where the timing is variable and reinforcement learning is employed, as required in natural language processing.
CTC 网络通过从动态规划中获取单词得分矩阵来训练 RNN,然后从状态转换概率矩阵中推断语音或文本模式。因此,循环神经网络中的神经元以动态方式不断变化,就像生物大脑一样,因此 CTC 可以有效地对随时间连续变化的过程进行建模,例如草书(连笔)手写识别和自然语言语音。甚至还开发了将声音与视觉可观察对象(例如手势和唇读)联系起来的视听关联语音识别 (AVSR) 模型。8
A CTC network trains the RNN by taking the word scores matrix from dynamic programming and then infers the speech or text pattern from the state transition probabilities matrices. The neurons in recurrent neural networks thus are continuously changing in a dynamic way, much like a biological brain, and therefore CTCs are effective for modeling processes that change with time in a sequential manner, for example cursive (connected) hand writing recognition and natural language speech. Even audio-visual linked speech recognition (AVSR) models that link sound with vision observables, such as hand gestures and lip-reading, have been developed.8
总之,大多数自动语音识别系统将语音表示为感知扭曲的特征向量序列,并使用相邻帧的平滑差异作为一阶和二阶时间导数进行增强。特征向量序列的概率由隐马尔可夫模型和高斯混合模型 (GMM) 建模,其中 HMM 由每个三音素的子 HMM 构建,并且各个高斯都是对角协变矩阵。
In summary, most automatic speech recognition systems represent speech as a sequence of perceptually warped feature vectors and are augmented with smoothed differences of neighboring frames acting as the first and second time derivatives. The probabilities of feature vector sequences are modeled by Hidden Markov models with Gaussian mixture models (GMMs) where the HMM is constructed from sub-HMMs for each triphone, and the individual Gaussians are all diagonally covariant matrices.
使用 alpha-beta 剪枝决策树对 HMM 状态进行聚类可以产生所需的参数绑定。HMM GMM 词概率通常采用循环神经网络 (RNN),而循环神经网络又使用连接主义时间分类 (CTC) 和长短期记忆网络 (LSTM) 来处理时间和序列,强化和自我监督学习可以产生参数优化,从而提高准确性。
Clustering the HMM states using alpha-beta pruned decision trees can produce desired parameter-tying. The HMM GMM word probabilities typically employ Recurrent Neural Networks (RNN) which in turn use Connectionist Temporal Classification (CTC) and Long Short-Term Memory Networks (LSTM) to process time and sequence, and reinforcement and self-supervised learning can produce parameter optimization for greater accuracy.
自动语音识别遵循自然语言的轨迹,充满不确定性、模糊性和推理;因此,自然语言处理是一种极其复杂的自上而下的信号处理工作,需要一系列自下而上的人工神经网络和技术才能成功。
Automatic speech recognition follows the arc of natural language, fraught as it is with uncertainty, ambiguity, and inference; natural language processing thus is an exceedingly complex top-down signal processing endeavor that requires a succession of bottom-up artificial neural networks and techniques to succeed.
所有上述网络计算均可使用 Google 的 TensorFlow 或 PyTorch 平台高效执行,这些平台采用 TPU 并行处理来执行自动矩阵运算和微积分微分,并且计算图允许任何感兴趣的程序员重复使用和扩展代码。
All the network computations described above can be efficiently performed using Google's TensorFlow or PyTorch platforms employing TPU parallel processing to perform automatic matrix operations and calculus differentiation, and a Computation Graph allows code reusability and extension available to any interested programmer.
除了在文本和语音识别的准确率上创下新纪录外,循环神经网络还学习了高级计算机语言 Python 中使用的逐个字符的序列,并以连续、动态的方式学习如何用 Python 编写计算机程序,这威胁到了全球计算机程序员的生计。9
In addition to setting new records for accurate text and speech recognition, a recurrent neural network learned the character-by-character sequence used in the high-level computer program language Python, and in a sequential, dynamic way learned how to write computer programs in Python, threatening the very livelihood of computer programmers worldwide.9
对于作家来说,情况会变得更加严峻,2020 年,OpenAI 推出了生成式预训练 Transformer-3 (GPT-3) 无监督语言机器,该机器能够完成几乎任何语言任务,而这一切都建立在对大量未标记训练集进行预训练的基础上。该模型的生成输出假设其线性依赖于其自身的先前值和随机项,从而形成具有判别微调的递归关系自回归模型。
Things would become even more serious for writers, in 2020 OpenAI introduced Generative Pre-trained Transformer-3 (GPT-3) unsupervised language machine capable of almost any language task founded upon pre-training on enormous unlabeled training sets. The generative output of the model assumes its linear dependence on its own previous values and a stochastic term to form a recurrence relation autoregressive model with discriminative fine-tuning.
GPT-3 目前拥有 1750 亿个 ML 参数,其中 4100 亿个字节对编码的 token 来自Common Crawl,190 亿个 token 来自WebText2,120 亿个 token 来自Book1,550 亿个 token 来自Book2,30 亿个 token 来自Wikipedia。除了散文和诗歌外,GPT-3 原则上还可以用 CSS、JSX、Python 编写代码,并且不需要进一步训练就可以用英语编写几乎任何内容。
GPT-3 currently has 175 billion ML parameters with 410 billion byte-pair-encoded tokens from Common Crawl, 19 billion tokens from WebText2, 12 billion from Book1, 55 billion from Book2, and 3 billion from Wikipedia. In addition to prose and poetry, GPT-3 in principle can code in CSS, JSX, Python, and does not require further training to compose almost anything in the English language.
潜在用户可以通过文本输入/文本输出 API 访问 GPT-3 工具集从 GitHub 开始,不久之后,书籍、文章的作者以及计算机程序员都将像渡渡鸟一样灭绝。
Potential users can access a GPT-3 toolset on a text-in/text-out API from GitHub, and before long the writers of books and articles, and computer programmers will all go the way of the dodo bird.
尽管看起来很奇怪,但为自然语言处理设计的深度循环神经网络已被用于流行病传播模型,并被用于生物学最大的挑战之一,即根据蛋白质的氨基酸序列预测其三维结构,以设计能够通过代谢作用治疗感染的药物和开发疫苗。10
Strange as it may seem, deep recurrent neural networks designed for natural language processing have been employed in epidemic spread models, and enlisted for one of biology's grandest challenges, predicting the three-dimensional structure of proteins from their amino acid sequences to design drugs that can metabolically act to treat infections and develop vaccines.10
过去,蛋白质折叠研究主要通过将蛋白质冷冻成晶体状结构,然后利用 X 射线晶体学检查折叠过程来进行,这个过程需要大量仪器且耗时。人工神经网络通常可以在数小时内完成蛋白质折叠,而新技术正在开发中,以将时间缩短至几秒钟。
In the past, protein folding studies were primarily performed by freezing a protein into a crystal-like structure and utilizing x-ray crystallography to examine the folding process in instrument-rich and time-consuming procedures. Artificial neural networks can perform protein folding typically in a matter of hours, and new techniques are being developed to reduce the time to seconds.
下图为新冠病毒,其特征性刺突蛋白看起来像排列在表面的塞子或手柄,因此得名冠状。右图以示意图形式显示了折叠成折叠蛋白的未折叠蛋白,这是蛋白质链折叠以获得其在 3D 结构中的生物功能构象的物理过程。11
A covid-19 virus is as depicted in the figure below, with the characteristic spike proteins looking like plugs or handles arrayed on the surface, hence the name corona. An unfolded protein folded into a folded protein is schematically shown in the figure at right in the physical process by which a protein chain folds to acquire its biologically functional conformation in a 3D structure.11
折叠分为四个阶段:一级折叠(氨基酸序列通过肽键结合在一起)、二级折叠(蛋白质折叠成 α 螺旋,通过氢键沿螺旋轴方向结合在一起,或通过氢键以 S 形结合在一起的 β 折叠片)、三级折叠(蛋白质折叠成 3D 构型,通过侧基之间的非共价相互作用结合在一起)和四级折叠(单个肽键与其他肽结合在一起)。12
There are four stages of folding: primary (amino acid sequence held together by peptide bonds), secondary (protein folded as an alpha helix held together by hydrogen bonds in the direction of the helical axis, or beta pleated sheets held together by hydrogen bonds in an S-shape, tertiary (protein folded into a 3D conformation held together by non-covalent interactions between side groups), and quaternary (a single peptide bond to other peptides).12
病毒通过进入血管紧张素转换酶 2 (ACE2) 受体与人体宿主细胞相互作用,并从那里扩散到宿主细胞,从而将病毒传播到全身。
A virus interacts with the human body host cells through entry into the angiotensin-converting-enzyme 2 (ACE2) receptors and spreads from there into host cells that transmit the virus throughout the body.
蛋白质的构象决定了蛋白质的功能;用于治疗或免疫的蛋白质折叠是一种寻找蛋白质构象的练习,它可以按照你的意愿发挥作用,在这里与 covid-19 刺突蛋白结合,从而阻止其进入 ACE2 受体的能力,有效地防止病毒产生任何生理影响。
The protein's conformation dictates the protein function; protein folding for therapy or immunization is an exercise in finding a protein conformation that does what you want it to do, here binding onto the covid-19 spike proteins thereby blocking its ability to enter ACE2 receptors, effectively preventing a virus from having any physiological effect.
因此,人工智能的任务是通过蛋白质折叠来设计这种蛋白质,产生所需的构象来做你想让它做的事情。谷歌的 DeepMind AlphaFold在 2018 年的蛋白质结构预测关键评估(CASP) 竞赛中以相当大的优势击败了其他竞争对手。
Therefore, the task for artificial intelligence is to design such a protein through protein folding producing the desired conformation to do what you want it to do. Google's DeepMind AlphaFold won the Critical Assessment of Protein Structure Prediction (CASP) competition in 2018 by a sizable margin over other competitors.
第一步,AlphaFold 采用深度神经网络从训练数据集中提取特征,然后搜索具有这些特征的合理蛋白质结构。它将蛋白质的氨基酸序列与训练集中的类似序列进行比较,以找到串联出现但在链中不相邻的氨基酸对,这意味着它们在折叠蛋白质中彼此靠近,这一过程称为多序列比对。DNN 经过训练以获取配对并预测折叠蛋白质中它们之间的距离。然后将预测结果与已知蛋白质中精确测量的距离进行比较,从而能够对蛋白质的折叠方式做出现实的猜测。
In a first step, AlphaFold employs a deep neural network to extract features from a training dataset and then searches for plausible protein structures having those features. It compares a protein's amino acid sequence with similar ones in the training set to find pairs of amino acids that appear in tandem, but do not lie next to each other in a chain, implying that they are positioned near each other in a folded protein in a process it called Multiple Sequence Alignments. The DNN was trained to take the pairings and predict the distance between them in the folded protein. Then the predictions were compared to precisely measured distances in known proteins and thereby enabled realistic guesses on how the proteins may fold.
并行运行的 DNN 将预测折叠蛋白链中连续氨基酸之间的连接角度。然后可以将这两个参数组合起来,以产生折叠蛋白结构,该结构旨在执行所需的蛋白相互作用,例如结合和阻断冠状病毒的刺突蛋白。
A parallel-running DNN would predict the angles of the joints between consecutive amino acids in the folded protein chain. The two parameters then could be combined to produce a folded protein structure designed to perform a desired protein interaction such as binding and blocking the spike proteins of the coronavirus.
然而,这种理论上的蛋白质折叠设计过程可能会产生物理上不可能的结构。因此,DNN 是在实际蛋白质结构上进行训练的,并通过梯度下降将成本函数最小化,以最接近与第一步中产生的氨基酸序列预测一致的折叠排列,从而产生物理上可行的抗病毒蛋白。
This theoretical protein folding design process, however, can produce structures that may not be physically possible. Thus the DNNs are trained on actual protein structures and the cost function was minimized by gradient descent to come closest to a folding arrangement consistent with the predictions of the amino acid sequences that were produced in the first step, thereby producing a physically viable antiviral protein.
基于 DNA 的抗病毒疫苗研发过程模仿了冠状病毒的部分基因序列,该序列可以预览病毒,从而产生抗体,但不会引起疾病本身,而是让免疫系统做好准备,攻击病毒的任何实际感染。13
The DNA-based process of developing an antiviral vaccine has been the mimicking of a part of the coronavirus’ genetic sequence that will give a preview of the virus in order to generate antibodies, but not cause the disease itself, instead readying the immune system to attack any actual infection from the virus.13
哈佛医学院基于自然语言处理技术的深度循环几何神经网络中的一步蛋白质折叠算法,是在映射到已知(因此是可能的)蛋白质结构的氨基酸序列数据集上进行训练的,其中端到端的序列到结构过程在几毫秒内完成。该代码已在 GitHub 上公开,希望能够广泛传播和众包访问。
Harvard Medical School's one-step protein folding algorithm in a deep recurrent geometric neural network based on natural language processing techniques was trained on a dataset of amino acid sequences mapping to known (and therefore possible) protein structures where the end-to-end sequence-to-structure procedure was performed in milliseconds. The code is publicly available on GitHub in hopes of wide-range dissemination and crowd sourcing access.
DNA 中的信息传送器,指示细胞从氨基酸序列制造蛋白质,被称为信使 RNA;mRNA 是由复杂的 RNA 分子以 DNA 的核苷酸序列为模板,在细胞核中的核糖体工厂中合成的。其中,Moderna 专注于 mRNA 蛋白质折叠 AI,以开发针对冠状病毒的疫苗。
A conveyor of the information in DNA to instruct the cell to make proteins from the amino acid sequence is called messenger RNA; mRNA is synthesized by complex RNA molecules using the nucleotide sequence of DNA as a template in a ribosome factory in the cell nucleus. Among others, Moderna has concentrated on mRNA protein folding AI to develop vaccines against the coronavirus.
病毒大流行会自然消退,因为会发生两件事:(1)感染者产生抗体并康复,(2)感染者无法康复并死亡,病毒失去了生存和传播的宿主。抗体的存在可以作为疾病的检测,可以将含有抗体的血浆(但不含红细胞)输入患者体内,进行恢复期血清免疫治疗,当然疫苗也可以预防感染。
A viral pandemic will peter out naturally because two things happen: (1) infected people produce antibodies and recover, and (2) infected people do not recover and die, depriving the virus of a host to live on and spread. The presence of antibodies can be used as a test for the disease, the plasma containing antibodies (but not red blood cells) can be infused in patients for convalescent serum immunotherapy, and of course vaccines can prevent infection.
问题在于病毒的变异能力。病毒变异的防御手段是新的抗体血浆和变异特异性蛋白质折叠,随着越来越多的数据可用,依赖数据的人工智能机器学习可以用于设计新的治疗方法和疫苗。
The rub is the virus’ ability to mutate. The defenses to viral mutations are new antibody plasma and mutation-specific protein folding, and with more and more data available, data-dependent artificial intelligence machine learning can be marshaled to design new treatments and vaccines.
病毒表面蛋白的某些部分具有较高的周转率,从而产生突变。在过去的一年里,来自世界各地患者的数万个冠状病毒样本已被基因测序,并上传到德国主办的全球共享所有流感数据倡议(GISAID)。人工智能算法比较这些序列,以找出病毒的哪些片段经常变化,哪些片段不经常变化,以帮助识别突变热点。然后(希望)可以执行相同的蛋白质折叠过程来满足突变。14
Certain parts of virus’ surface proteins have a high turnover rate producing mutations. Over the past year tens of thousands of coronavirus samples from patients around the world have been genetically sequenced and uploaded into the Global Initiative on Sharing All Influenza Data (GISAID) hosted in Germany. AI algorithms to compare those sequences to find which segments of the virus change frequently and which do not, to help identify mutation hot spots. Then (hopefully) the same protein-folding processes can be performed to meet the mutations.14
电视最早的语音合成器是数百年前由阿尔伯特·马格纳斯、罗杰·培根和查尔斯·惠斯通等科学界杰出人物发明的。这些发音合成系统以机械方式模拟人类声道,包括声带生物力学、声门空气动力学以及声波在由风箱吹出的空气驱动的机械说话头的生物力学支气管、气管、鼻腔和口腔中的传播。
The earliest speech synthesizers were developed hundreds of years ago by scientific luminaries such as Albertus Magnus, Roger Bacon, and Charles Wheatstone. These articulatory synthesis systems mechanically model the human vocal tract, with vocal fold biomechanics, glottal aerodynamics, and acoustic wave propagation in the biomechanical bronchia, trachea, nasal and oral cavities of mechanical talking heads powered by puffs of air from bellows.
近期,贝尔实验室开发了一种语音编解码器(声码器),称为Voder,用于电生成电话语音。如下图所示,操作员通过按下产生鼻音的腕带来创建元音,通过产生嘶嘶声的白噪声管来生成辅音,用脚踏板来控制音调,爆炸音“p”和“d”以及塞擦音“j”和“ch”由频谱键激活,频谱键从十个带通滤波器中选择这些基本声音进行调制,形成语音组合,然后传输到扬声器以演示合成语音。
More recently, Bell Labs developed a voice codec (Vocoder) it called the Voder for electricity-generated telephone speech. As diagrammed in the schematic figure below, an operator creates vowels by depressing a wrist bar producing nasal buzz tones, consonants are generated by a white noise tube producing a hiss, with a foot pedal to control pitch, and the explosive “p” and “d” and the affricative “j” and “ch” activated by spectrum keys that select from ten band-pass filters for modulation of these basic sounds to form combinations of speech that are transmitted to a speaker for demonstrations of synthetic speech.
1939 年,语音合成器在纽约世界博览会上展出,并配上一句问候语:“下午好,广播听众们”。毋庸置疑,清晰的语音生成需要操作员接受不少培训和具备一定的技能。1
The Voder was displayed at the 1939 New York World's Fair with the greeting, “Good afternoon, radio audience”. Needless to say, intelligible speech generation required no little training and skill of the operator.1
20 世纪 40 年代末,语音合成先驱 Franklin Cooper 发明了模式回放机,可以将语音模式的声谱光谱仪转换成可听见的合成语音。1968 年,贝尔实验室采用无处不在的 IBM 704 计算机合成了歌曲“Daisy Bell”,这首歌后来由亚瑟·克拉克的电影“ 2001太空漫游”剧本中的计算机 HAL 播放。
In the late 1940s, speech synthesis pioneer Franklin Cooper developed the Pattern Playback machine that converted sound spectra spectrographs of patterns of speech into audible synthesized speech, and in 1968, Bell Labs employed the ubiquitous IBM 704 computer to synthesize “Daisy Bell”, a song that was subsequently played by the computer HAL in Arthur C. Clarke's screenplay for the film “2001A Space Odyssey”.
随着计算机的出现,合成语音的基本的空气、电和声谱光谱仪源可以被计算机软件取代,并经过改进,变得更加逼真。
With the advent of the computer, the rudimentary air, electricity, and sound spectra spectrograph sources of synthesized speech could be replaced by computer software and refined for greater verisimilitude.
用于语音合成的线性预测编码 (LPC) 是由日本名古屋大学开发的,电子说话头像在美国和日本使用数字合成来产生清晰的语音固件,德州仪器的 LPC 微处理器被用于1970 年代后期流行的“说话和讲述”玩具中。
Linear predictive coding (LPC) for speech synthesis was developed at Nagoya University in Japan, electronic talking heads were developed in America and Japan using digital synthesis to produce articulatory speech firmware, and Texas Instruments’ LPC microprocessors were used in its Speak and Tell toys popular in the late 1970s.
日本 NTT 于 1975 年开发了线谱对(LSP),该技术在数学上将 LPC 方程预测系数a i配对,以提高稳定性和共振,并且这些LPC 滤波器用于更紧密地匹配电子语音波形。随后,LSP 技术在 20 世纪 90 年代被采用为移动电话和互联网的国际语音编码标准。2
Japan's NTT in 1975 developed the Line Spectral Pairs (LSP) that mathematically pair the LPC equation predictor coefficients ai for improved stability and resonance, and these LPC filters were used to more closely match electronic speech waveforms. The LSP technique was subsequently adopted in the 1990s as the international speech coding standard for mobile telephony and the Internet.2
随着准确度的提高,语音合成可以扩展到更普遍的用途。第一个文本转语音 (TTS) 系统由意大利的Centro Studi e Laboratori Telecommunicazioni (CSELT) 于 1975 年使用多通道语音自动化(MUSA) 专用计算机和双音合成软件开发而成。MUSA 能够大声朗读和演唱印刷文本中的意大利语歌曲。后来,贝尔实验室、麻省理工学院和数字设备公司在 20 世纪 80 年代开发了 TTS DECtalk 自然语言处理 (NLP) 计算机,用于多语言文本转语音合成。
With more accuracy, speech synthesis could be extended to more general uses. The first text-to-speech (TTS) system was developed in 1975 by Italy's Centro Studi e Laboratori Telecommunicazioni (CSELT) with the Multichannel Speaking Automation (MUSA) dedicated computer and diphone-synthesis software. MUSA was able to read aloud and sing Italian songs from printed text. Later, Bell Labs, MIT, and Digital Equipment Corporation in the 1980s developed the TTS DECtalk Natural Language Processing (NLP) computer for multilingual text-to-speech synthesis.
TTS 的基本流程是:前端处理器将书面文本转换为音素表示,首先将数字和缩写翻译成等效的书面单词(文本规范化或标记化);区分同形异义词,例如“read”应该合成为“red”还是“reed”(由词性标注确定);为每个单词分配音标;将文本分割为单词、短语、小句和句子韵律(音高轮廓和音素持续时间)单元;然后后端处理器执行韵律预测并生成从离散到连续的合成语音的波形,如下图左侧的连续后端处理流程图所示。3
The basic TTS process is for a frontend processor to convert written text to a phonemic representation by first translating numbers and abbreviations into the equivalent written words (text normalization or tokenization), distinguish homographs, for example whether “read” should be voice-synthesized as “red” or “reed” (determined by part-of-speech tagging), assigning phonetic transcriptions to each word, and segmenting the text into word, phrase, clause, and sentence prosodic (pitch contour and phoneme duration) units, a backend processor then performs prosody prediction and generates waveforms for discrete to continuous synthesized speech as shown in the continuous backend processing flowchart on the left of the figure below.3
在所谓的拼接合成中,将以电子波形形式录制的语音片段串在一起,尽管单个字符串听起来可能非常自然,但字符串之间会有明显的停顿,这种停顿在早期的自动电话应答服务中每个人都经历过。下图右图示意性地显示了一个典型的系统。
In so-called concatenative synthesis, segments of recorded speech in the form of electronic waveforms are strung together, and although an individual string may sound quite natural, there are noticeable pauses between strings that everyone has experienced in early automated telephone answering services. A typical system is shown schematically in the figure below at right.
特定领域合成采用一组预先录制的单词和短语,这些单词和短语听起来很自然,但仅限于该领域,例如早期会说话的娃娃、秤和钟表。在双音合成中,每个单词使用一组较大的声音到声音过渡语音中的一个样本,并且可以将线性预测编码或离散余弦傅立叶变换应用于双音以提供句子韵律,但由于数据库限制每个单词只有一个声音,语音听起来只能像机器人一样。
A domain-specific synthesis employs a set of pre-recorded words and phrases that are natural sounding, but limited to the domain, such as for the early talking dolls, scales, and clocks. In diphone synthesis, one sample of a larger set of sound-to-sound transition speech is used for each word, and linear predictive coding or discrete cosine Fourier transforms can be applied to the diphones to provide sentence prosody, but because of the database limitation to one sound per word, the speech cannot help but sound robotic.
在共振峰合成中,合成语音完全由频率声谱振幅峰值(共振峰)的电子信号处理形成,然后将正弦波相加(加法合成),或使用数学物理模型创建全新的波形。虽然能够进行更通用的合成,但这不可避免地会产生听起来相当电子的语音。
In formant synthesis, the synthesized speech is wholly formed from electronic signal processing of the frequency sound spectrum amplitude peaks (formants), and then adding sine waves together (additive synthesis), or using mathematical physics models to create whole new waveforms. Although capable of more generalized synthesis, this not unpredictably produces rather electronic-sounding speech.
由于数字信号处理的复杂性,数字合成语音可以提供清晰的语音,消除连接和共振峰合成模型的不自然停顿,并可以产生非常自然的语音,而韵律和语调的调制可以产生情感和语调,特别适用于人形机器人。
Because of the sophistication of the digital signal processing, digitally synthesized speech can provide articulation that eliminates the unnatural pauses of concatenative and formant synthesis models, and can produce quite natural speech, and modulations of prosody and intonation can produce emotion and tone, useful particularly for humanoid robots.
响应人类语音命令或问题的合成语音当然首先需要识别文本或口头输入,然后从合成语音数据库的文件中选择适当的响应。
Synthesized speech in response to human speech commands or questions of course first requires the recognition of the text or spoken input and then the choice of an appropriate response from the files in its synthetic speech database.
在单元选择合成中,记录的话语波形被分成音素(任何语音)、双音素、半音素、音节和词素(语言的最小语法单位,不能分解为更小的独立语法部分),以形成数据库中的文件。
In unit selection synthesis, recorded utterance waveforms are divided into phones (any speech sound), diphones, half-phones, syllables, and morphemes (minimal grammatical units of a language that cannot be broken down into smaller independent grammatical parts) to form the files in the database.
为了构建一个句子,这些单元的索引基于音高、持续时间、位置和相邻音素的分割和声学参数,以便可以对相关的单词和短语进行分类,并且可以通过使用从数据库中得出的最可能单元链的加权决策树来构建整个句子的合成语音。
To construct a sentence, an index of these units is based on the segmentation and acoustic parameters of pitch, duration, position, and neighboring phones so that related words and phrases can be classified, and synthetic speech of whole sentences can be constructed by use of a weighted decision tree of the most probable chain of units derived from the database.
人工智能机器人显然需要语音合成。例如,隐马尔可夫模型可以包含完整的词典,可以根据拼写或规则或两者的组合搜索发音,以处理词性标注。深度神经网络可以从录制的语音数据集中训练模型以生成听起来很自然的单词,而 HMM 可以概率地对单词波形的声音频谱、音高和持续时间进行建模,以形成听起来很自然的句子。4
Artificially intelligence robots clearly require speech synthesis. For example, Hidden Markov Models can include complete dictionaries that can be searched for pronunciation based on spelling or rules or combinations thereof to handle the part-of-speech tagging. Deep neural networks can train the model from recorded speech datasets to produce natural-sounding words, and HMMs can model the sound spectrum, pitch, and duration of word waveforms probabilistically to form natural language-sounding sentences.4
然后,就像在语音识别系统中一样,可以采用循环神经网络、LSTM、CTC 以及其他采用监督和自监督学习的网络和模型来改进文本到语音和语音合成。
Then just as in speech recognition systems, recurrent neural networks, LSTMs, CTCs, and other networks and models employing supervised and self-supervised learning can be employed to refine the text-to-speech and voice speech synthesis.
这些技术用于更现代的语音合成器,例如 DeepMind 的 WaveNet、Google 的 Tacotron 和百度的 DeepVoice。更进一步说,Adobe Voco 和 Google WaveNet 是音频编辑软件生成工具,可以通过采集语音样本进行训练,生成与特定说话者非常相似的合成语音,并通过使用语音识别推理生成特征语音,甚至可以包含训练数据集中没有的音素。
These technologies are used in the more modern speech synthesizers such as DeepMind's WaveNet, Google's Tacotron, and Baidu's DeepVoice. Going further, Adobe Voco and Google WaveNet are audio-editing software-generating tools that can be trained to produce synthesized speech that closely mimics a particular speaker by taking a voice sample, and generating characteristic speech that through the employment of speech recognition inferences, can even include phonemes that were not in the training data set.
与所有技术创新一样,自然语言合成语音可能会被滥用,例如有人在商业广告、戏仿作品中不道德地将话语放入公众人物的口中,或者为了敌对的政治利益。
Natural language synthetic speech, like all technological innovations, can be and have been abused, for instance by the unethical putting words into the mouths of public figures in commercials, parodies, and for adversarial political gain.
然而,一位公众人物的思想被合成,并非为了邪恶、搞笑或政治利益,而是为了揭示宇宙最深奥的奥秘。已故著名理论物理学家斯蒂芬·霍金的诡异机器人声音最初使用 DECtalk,但这需要他输入文字进行 TTS 合成,而由于手部肌肉因 ALS 退化,他越来越无法做到这一点。
However, one public figure's thoughts were synthesized not for nefarious, comedic, or political gain, but rather for exposition of the deepest mysteries of the Universe. The late renowned theoretical physicist Stephen Hawking's eerily robotic voice at first used DECtalk, but this required him to type the words for TTS synthesis, and he was increasingly unable to do so as his hand muscles degenerated from ALS.
在传感器技术和人工智能的非凡展示下,英特尔、SpeechPlus 和霍金的研究生助手随后开发的语音合成系统,跟踪霍金面颊肌肉的抽搐,通过训练有素的深度神经网络来预测单词选择;例如,他只需要以特定方式抽动面颊肌肉来说出“the”这个词,循环神经网络就会立即根据上下文推断出单词“黑洞”。
In extraordinary displays of sensor technology and artificial intelligence, subsequent speech synthesis systems developed by Intel, SpeechPlus, and Hawking's graduate assistants, followed twitches in his cheek muscle to predict word selection from a deep neural network trained on his books, papers, and speeches; for example, he had merely to twitch his cheek muscle in a particular way for the word “the” and a recurrent neural network immediately produced the contextually concatenated inferred words “black hole”.
史蒂芬·霍金的人工智能演绎演讲让他过上了更加富有成效的生活,对科学大有裨益,但讽刺的是,演讲中也提到了即将到来的人工智能奇点的危险,5
Stephen Hawking's AI-deduced speech that allowed him to live an extremely more productive life to the good of science ironically also included words on the dangers of the looming AI singularity,5
我担心人工智能可能会完全取代人类。如果人类设计了计算机病毒,那么就会有人设计出能够自我改进和复制的人工智能。这将是一种超越人类的新生命形式。
I fear that AI may replace humans altogether. If people design computer viruses, someone will design AI that improves and replicates itself. This will be a new form of life that outperforms humans.
和
and
这要么是我们经历过的最好的事情,要么是最糟糕的事情。如果我们不小心,这很可能是最后的事情
It will either be the best thing that's ever happened to us, or it will be the worst thing. If we are not careful, if very well may be the last thing
这个声明之所以更加引人注目,正是因为它本身是由机器人生成的。
This pronouncement was made so more dramatic precisely because it was itself robotically-generated.
霍金的警告确实有可能成为现实,但目前人工智能的奇点(至少对于语音合成而言)尚未实现,因为没有人会将他合成的单词误认为是自然的人类语音。这说明,尽管合成语音已经发展到可理解性和通用性的程度,但主要是由于尴尬的停顿和奇怪的音节强调,合成语音似乎永远受到缺乏自然性的阻碍。
Hawking's warnings indeed might materialize, but for now the AI singularity, at least for speech synthesis, has not yet come to pass, for no one would mistake his synthesized words for natural human speech. This elucidates the fact that although synthesized speech has progressed to the point of intelligibility and generality, mostly because of awkward pauses and strange syllabic emphasis, synthetic speech is seemingly forever hampered by a lack of naturalness.
IBM 的 WatsonQA Jeopardy和 Project Debater Grand Challenges使用了人类语音识别和文本转语音合成技术。除了文本识别之外,它们当然还需要数据、分析能力、语音表达和传递。
Human speech recognition and text-to-speech synthesis technologies were used in IBM's WatsonQA Jeopardy and Project Debater Grand Challenges. They both of course required, in addition to text recognition, data, analytics capabilities, speech formulation and delivery.
与 WatsonQA 类似,Miss Debater 的辩论技能包括自动 TTS 识别,采用深度卷积循环神经网络和长短期记忆网络,可以“聆听”和理解辩论的进程。然后,她的声明检测引擎会在她的数据库中找到声明,确定声明边界,并根据相关性和说服力对证据进行评分,从而形成她的回应。
Similarly to WatsonQA, Miss Debater's debating skills included automatic TTS recognition employing deep convolutional recurrent neural networks and long short-term memory networks that could “listen” to and comprehend the course of the debate. Her response was then formed from her claim detection engine that found the claim in her database, determined the claim boundaries, and scored the evidence as to relevance and persuasiveness.
辩手小姐的辩论立场和观点建立在她利用深度神经网络从大量自动标记数据中对高质量标记数据进行深度论证挖掘的基础上。
Miss Debater's argument stance and sentiment were founded on her deep neural networks’ deep argument mining from high-quality labeled data with voluminous automatically-labeled data.
深度论证挖掘使用知识图谱,从许多来源(如维基百科和美国中央情报局世界概况)收集信息,这些信息包含数十亿个按关系组织的事实,所谓的知识盒来评估争议和困境,并对信息数据的共性和差异进行建模。6
The deep argument mining used knowledge graphs that gathered information from many sources (such as Wikipedia and the CIA World Factbook) comprising billions of facts that were organized relationally in so-called knowledge boxes to assess controversies and dilemmas and model the commonalities and discrepancies of the information data.6
回答被转录并进行文本转语音合成,这样,主张、反驳和论据就以连续且有变化的语音形式呈现,形成令人信服、易懂且有说服力的论据,有时还会加入不协调的机器人幽默。为此,IBM 开发了 TTS 算法,该算法采用富有表现力的合成语音模型,具有可预测的短语中断以及单词和句子强调。
Responses were transcribed and text-to-speech synthesized so that claims, rebuttals, and arguments were offered in continuous and inflected speech for cogent, intelligible, and persuasive arguments, abetted at times with incongruous robot humor. For this, IBM developed TTS algorithms employing expressive synthetic speech models with predictable phrase breaks and word- and sentence-emphasis.
这场历史性的辩论只缺少一道遮挡观众视线的幕布,因为如果辩论结束后辩论主持人要求观众区分人类和机器,那么这场辩论可能就是一场图灵测试,如果他们不能,那么辩手小姐将证明其智力至少与人类相当,而且实际上不是任何人类,而是一位成就卓著的人类辩论冠军。然后,如果更多的观众改变对辩手小姐的提议的看法,她的胜利将明确标志着人工智能奇点的到来,以及它所预示的一切。
All this historic debate lacked was a curtain hiding the debaters from the audience's view, for the debate could have been a Turing Test if after the debate the debate host had asked the audience to distinguish the human from the machine, and if they could not, Miss Debater would have established intelligence at least equivalent to a human, and actually not any human, but an accomplished champion human debater. Then if more of the audience changed their view to Miss Debater's proposition, her victory would have definitively marked the arrival of the AI Singularity, and all that that portends.
然而,图灵测试中一个明显的漏洞是语音合成器有时会以奇怪的方式发音晦涩的技术术语、外来词和不自然的停顿,即使是以口语化的含糊不清的方式传达。
However, a clear giveaway in the Turing Test would have been the speech synthesizer's at times peculiar enunciation of obscure technical terms, foreign words, and unnatural pauses, even if delivered with colloquial disertitude.
而这,或许是辩论小姐败给哈里什·纳塔拉詹的一个因素。虽然女声被用来抚慰恐惧,但由于是电子合成的,它仍然像机器人一样,无法挽救,而机器人幽默虽然可以让人放松和开心,但也可能让人沮丧。
And this, it may be surmised, was a factor in Miss Debater's loss to Harish Natarajan. Although the female voice was used to soothe fears, because of its electronic synthesis, it remained irredeemably robotic, and while robot-humor may put at ease and amuse, it may also dismay as well.
此外,辩手小姐不祥的挑战性开场白并没有让她赢得完全由人类组成的观众的支持;纯粹的机器人傲慢预兆可以轻易削弱人类受害者的任何善意。
Furthermore, Miss Debater's ominously challenging opening statement did her no good in winning over an audience composed entirely of humans; portents of unalloyed robot hubris can easily diminish any good will of the human victims.
就像对“深蓝”表现出的敌意一样,人类观众很可能下意识地站在了人类一边,揭示了对机器击败人类的根深蒂固的心理恐惧。
Just as in the case of the animosity displayed against Deep Blue, the human audience likely subconsciously sided with the human, revealing a deep-rooted psychic fear of machines besting humans.
机器赢得了所有客观评分的挑战,人类唯一获胜的比赛是主观评判的辩论。也许一个更加欣赏机器人的极客观众或者机器人本身都会投票给辩手小姐。
A machine won all the objectively scored challenges, the only contest that the human won was the subjectively-judged debate. Perhaps an audience of more robot-appreciative geeks or robots themselves would have voted for Miss Debater.
回想起来,哈里什·纳塔拉詹之所以能获胜,至少部分是因为辩手小姐的演讲无论合成得多么好,都会引起一些认知失调,而温文尔雅、不做作的纳塔拉詹的辩论表达无疑产生了一种有吸引力的共鸣,这是当时、甚至永远都没有任何合成声音可以比拟的。
In retrospect, Harish Natarajan won at least in part because of Miss Debater's speech no matter how well-synthesized would nevertheless create some cognitive dissonance, while the urbane and unaffected Natarajan's debating delivery no doubt produced an attractive resonance that no synthetic voice could then, or perhaps ever, match.
电视机器人最终将取代几乎所有流水线工人和雇佣车辆司机已经是板上钉钉的事情,玩电子游戏比人类玩得更好的机器也可以勉强被接受,毕竟它们都是计算机生成的,也许我们的年轻人中会越来越少沉迷于游戏。甚至濒临灭绝的医生和律师也是可以相信的,但要打败国际象棋和围棋大师(这两者在大众观念中是人类最高智力的原型),并能够与来自剑桥的冠军辩论家辩论,这是远远超越我们所有普通人的能力。
That robots will eventually replace almost all assembly-line workers and for-hire vehicle drivers is already a looming certainty, machines that can play video games better than human beings can be grudgingly accepted, after all they are both computer-generated, and perhaps fewer of our youth will become addicted. Even medical doctors and lawyers on the verge of extinction is believable, but to beat the Masters of chess and Go, the two archetypes in popular conception of supreme human intelligence, and be able to debate a champion debater from Cambridge, that is ability far surpassing all of us ordinary humans.
如果这一切还不是人类在地球上的霸权终结的预兆,那只能是因为我们人类已经阻止了人工智能机器人的发展,如果这真的发生了,其后果是好是坏将永远不得而知。但迄今为止,机器人已经帮助改善了世界和社会,正如它们发展的历史所证明的那样。
If all of that is not a portent of the end of humankind's pre-eminence on this Earth, it will only be because we humans have arrested the AI robot's growth, and if that is indeed what comes to pass, the consequences for good or evil will never be known. But so far robots have helped to improve the world and society, as demonstrated by the history of their development.
在敦刻尔克大撤退以及低地国家和法国投降后,赫尔曼·戈林提议进行空袭,以削弱英国的海空防御能力,随后进行封锁,希特勒又发起海狮行动跨越海峡入侵,所有这些目的都是为了迫使英国求和,让纳粹德国得以向东转向,猛烈地争夺生存空间。
After the evacuation from Dunkirk and the surrender of the Low Countries and France, an air attack proposed by Hermann Göring was set to cripple Britain's naval and air defenses, followed by a blockade and Hitler's Operation Sea Lion cross-Channel invasion, altogether designed to force Britain to sue for peace, freeing the Nazis to turn East in their violent pursuit of lebensraum.
对英国的战役始于 1940 年夏天,当时数百架亨克尔、道尼尔和容克斯重型轰炸机以及 Ju-87斯图卡俯冲轰炸机袭击了英国的港口、航运中心、机场和基础设施。
The campaign against Britain began in the Summer of 1940 when hundreds of Heinkel, Dornier, and Junkers heavy bombers and Ju-87 Stuka dive bombers pounded Britain's ports, shipping centers, airfields, and infrastructure.
当轰炸机在白天在梅塞施密特战斗机的掩护下抵达时,英国依靠人工瞭望和电话进行通信,而早期预警使英国的霍克飓风和喷火战斗机有时间勇敢地升空迎战敌人。许多轰炸机被飓风战斗机击落,而梅塞施密特战斗机则与喷火战斗机展开了一场残酷而可怕的消耗战。
When the bombers arrived with Messerschmitt fighter cover in daylight, Britain relied on human lookouts and telephones for communication, and the early-warning allowed Britain's Hawker Hurricane and Spitfire fighters time to courageously rise to meet the enemy in the air. Many of the bombers were brought down by the Hurricanes who were in turn preyed upon by the Messerschmitts with whom the Spitfires fought in an air combat of relentless and horrific attrition.
一半的防守飞行员和机组人员(约 520 人)在空战中丧生。为什么只有他们参与防守?高射炮在哪里?确实有 264 门高射炮,数量在两天内翻了一番,但它们无法击中敌机,事实上,正是丘吉尔的“少数人”在第一次不列颠战役中“功不可没”地挽救了局面。
Half of the defending pilots and aircrew, some 520 men, were killed in the air battle. Why were they alone in the defense? Where were the anti-aircraft guns? Indeed there were 264 anti-aircraft guns with the number doubling in two days, but they could not hit the enemy aircraft and indeed it was Churchill's “Few” to whom “so much [was] owed” who saved the day in the First Battle of Britain.
1940 年 10 月 14 日,380 架亨克尔和容克斯轰炸机抵达伦敦上空,尽管发射了 8,326 发防空炮弹,但防空炮只击落了两架编队飞行的缓慢移动的重型轰炸机。困难总结如下:1
On October 14, 1940, 380 Heinkel and Junkers bombers arrived over London, and although 8,326 antiaircraft rounds were fired, the AA guns shot down only two of the slow-moving heavy bombers flying in formation. The difficulties were summarized thusly:1
用高射炮击落飞机并不容易……目标并非静止不动,而是以高达 300 英里/小时的速度移动,并能够向左、向右、向上或向下改变航向。如果目标飞得很高,炮弹可能需要 20 或 30 秒才能到达目标,而且必须将高射炮放置在相应距离之外。此外,还必须确定射程以便设置引信,最重要的是,必须连续进行此操作以确保高射炮始终指向正确的方向。当你准备射击时,虽然飞机的发动机声音就在头顶上,但实际上它在两英里之外。要想在如此高的高度击中它,炮手可能必须瞄准两英里远的地方。[只有]那时,如果袭击者不改变航向或高度(在遭到攻击时自然会这样做),爬升的炮弹和轰炸机才会相遇。换句话说,在水晶宫上空听到的袭击者实际上当时正在达利奇上空;而射向水晶宫的炮弹必须经过议会广场才能击中。
It isn't easy to shoot down a plane with an anti-aircraft gun ... Instead of sitting still, the target is moving at anything up to 300 m.p.h. with the ability to alter course left or right, up or down. If the target is flying high it may take 20 or 30 seconds for the shell to reach it, and the gun must be laid a corresponding distance ahead. Moreover the range must be determined so that the fuse can be set, and above all, this must be done continuously so that the gun is always laid in the right direction. When you are ready to fire, the plane, though its engines sound immediately overhead, is actually two miles away. And to hit it with a shell at that great height the gunners may have to aim at a point two miles farther still. [Only] then, if the raider does not alter course or height, as it naturally does when under fire, will the climbing shell and the bomber meet. In other words the raider, which is heard overhead at the Crystal Palace, is in fact at that moment over Dulwich; and the shell which is fired at the Crystal Palace must go to Parliament Square to hit it.
由于充分意识到瞄准问题,高射炮瞄准通常被降为对轰炸机预计前进方向前方某个区域的高度进行覆盖射击,希望它们会飞进爆炸的近炸引信炮弹雨中并自我摧毁。
Fully aware of the aiming problems, anti-aircraft gun-laying was often relegated to blanket firing to an altitude in an area in front of where the bombers were believed to be proceeding, hoping that they would simply fly into the hail of exploding proximity-fuse shells and destroy themselves.
毋庸置疑,这种一厢情愿的战术无法阻止轰炸浪潮;轰炸机的追踪和防空炮的瞄准必须得到改进。
Needless to say, such wishful tactics could not stem the tide of bombing; the tracking of the bombers and the aiming of the AA guns had to improve.
在 1941 年伦敦遭受闪电战袭击后,伦敦的夜间轰炸机袭击一直持续到现在,但飓风和喷火式战斗机无法发现敌人,无法与其展开空战。英国的防空力量随即完全集中在防空炮上,尽管泛光灯和固定基线声学定位器可以发现接近的轰炸机编队,但由于防空瞄准系统的缺陷,伦敦在夜间几乎毫无防御能力,2
In the succeeding nighttime bomber raids of the Blitz that terrorized London well into 1941, the Hurricanes and Spitfires could not see the enemy to engage them in air battle. Britain's air defenses thereupon fell entirely on the anti-aircraft guns, and although floodlights and fixed-baseline acoustic locaters could spot the formations of approaching bombers, because of the shortcomings of the anti-aircraft aiming systems, London at night was virtually defenseless,2
我们依赖于高射炮……但除了在袭击开始时发射的一次齐射外,没有一门炮为我们防御……我们感觉自己就像是活靶子……
We had depended on anti-aircraft guns … and apart from a solitary salvo loosed at the beginning of the raids, no gun had been shot in our defence … we felt like sitting ducks ….
在杰出物理学家 PAM Blackett 的指导下,英国皇家防空司令部汇集了最优秀的人才来解决这个问题,其中包括著名的数学物理学家 Ralph H. Fowler、Douglas Hartree 和 Edward A. Milne。
Britain's best minds were brought to bear on the problem at the Royal Antiaircraft Command under direction of the distinguished physicist, P.A.M. Blackett and included the well-known mathematical physicists Ralph H. Fowler, Douglas Hartree, and Edward A. Milne.
科学的防空瞄准始于这两个典型的非线性弹道微分方程的数学物理,这些方程表示防空炮在xy平面上发射的射弹的轨迹,是时间t的函数,
Scientific anti-aircraft targeting begins with the mathematical physics of these two exemplary non-linear ballistic differential equations for the trajectories of projectiles fired from AA guns in the xy-plane as functions of the time t,
其中,g = 9.8 m/sec2是重力加速度,m是抛射体的质量,ρ是空气密度,C d是阻力系数,取决于抛射体的几何形状,A = (πd2 ) /4是抛射体的迎风面积。
where g = 9.8 m/sec2 is the gravitational acceleration, m is the mass of the projectile, ρ is the density of the air, Cd is the drag coefficient, which depends on the geometry of the projectile, and A = (πd2)/4 is the frontal area of the projectile.
非线性微分方程无法以封闭形式求解,因此它们以算术形式建立,并招募年轻女性使用加法机对弹道射击表进行数字计算。3
Non-linear differential equations cannot be solved in closed form, so they were set up arithmetically and young women were recruited to numerically compute the ballistic firing tables using adding machines.3
然而,每条弹道需要进行 750 多次乘法运算,每次计算需要 2000 条弹道,即使是极其勤奋的人也无法准确地手工完成,因为任何错误都可能造成灾难性的后果。幸运的是,新的差分分析仪(也由年轻女性操作)可以更及时地计算出防空炮弹的理论弹道。
However, the more than 750 different multiplications for each trajectory with 2000 trajectories per calculation were something that even extremely diligent humans could not accurately perform by hand, keeping in mind that any errors could have devastating consequences. Fortunately, the new differential analyzer machines (also operated by young women) could more timely produce the theoretical trajectories of the anti-aircraft shells.
但高射炮本身仍然需要进行火力指挥,首先引导炮弹离开炮口,与空中的射弹微分方程进行交战,以期击落纳粹轰炸机。
But the antiaircraft guns themselves still had to be fire directed to first guide those shells out of their muzzles to engage the projectile differential equations in the air to hopefully shoot down the Nazi bombers.
炮弹弹道数学计算合理,解决方案正确,实施过程中得到了高射炮瞄准启发式方法的大力帮助,但最初的计算依赖于手动操作的光学跟踪器,该跟踪器提供目标距离和方位值,其中变化率计算被汇编在导数弹道射击表中,这些表用于机械转动高射炮射击指挥仪的轴和齿轮,从而设定高度、射程和方向。
The artillery shell trajectory mathematics was sound and the solutions true, with implementation helped along by no little anti-aircraft gun-laying heuristics, but their initial reckoning depended on manually operated optical trackers that supplied target range and bearing values in which rate of change calculations were compiled in derivative ballistic firing tables that were consulted to mechanically turn the shafts and gears of the fire directors of the antiaircraft guns, setting elevation, range, and direction.
然而,当枪准备好射击时,目标已经转移,情况也发生了变化,整个瞄准过程必须重复,但往往收效甚微甚至毫无效果。
By the time the gun was ready to fire, however, the targets had gone on, conditions had changed, and the whole targeting procedure had to be repeated, often to little or no avail.
与此同时,纳粹德国正在准备快速的V-1和V-2火箭,用于发动比轰炸机更具破坏力的“飞行炸弹”袭击;形势十分严峻。
Meanwhile the Nazis were preparing the fast V-1 and V-2 rockets for “flying bomb” attacks that would be even more devastating than the bombers; the situation was dire.
首先出手相救的是新开发的雷达,它可以通过示波器屏幕上显示移动的光点,提供飞机的实时昼夜连续位置、速度和方向跟踪。这个跟踪光标将成为敌机的祸害时间在前进,但仍然需要进行更精确的炮射,才能击落雷达示波器上显示的轰炸机和导弹。
First to the rescue was the newly-developed radar that could provide the real-time day and night continuous position, speed, and direction tracking of aircraft by displaying a moving blip on an oscilloscope screen. This tracking cursor was to be the scourge of enemy aircraft from this time forward, but more accurate gun-laying still had to be performed to shoot down the bombers and missiles that showed up on the radar oscilloscope.
贝尔实验室 29 岁的戴维帕金森 (David Parkinson) 一直致力于自动电平记录器的研究,该记录器可以测量和控制电压,从而为 AT&T 的电话传输线路提供均匀、不间断的语音通信。
Twenty-nine year old David Parkinson at Bell Labs had been working on automatic level recorders that measured and controlled voltages to provide even and uninterrupted voice communication in AT&T's telephone transmission lines.
一个响应电压变化的电位器控制着一个在移动的纸条上书写的笔式记录器,帕金森受到一个梦的启发,意识到这个电位器可以以电子方式跟踪示波器屏幕上雷达光点的电子信号,并且通过对该光点运动的实时和导数计算,在连续反馈回路中控制高射炮的射击指挥仪紧跟光点。
A potentiometer responding to voltage changes controlled a pen recorder writing on a moving strip of paper, and Parkinson, apocryphally inspired by a dream, realized that this potentiometer could just as well electronically follow the electronic signal of a radar blip on an oscilloscope screen, and from real-time and derivative calculations of that blip's motion, control the fire director of an antiaircraft gun in a continuous feedback loop to closely follow the blip.
1942 年冬天,贝尔实验室向美国陆军交付了这样一种电子模拟火力指挥仪;M-9 预测器,它跟踪炮位雷达信号,并以机电方式指挥一门巨大的 90 毫米后膛高射炮击落入侵飞机和飞行炸弹。
In the Winter of 1942, Bell Labs delivered such an electronic analog fire director to the United States army; the M-9 Predictor, which tracked the gun-laying radar blip and electromechanically directed a massive 90-millimeter breech antiaircraft gun to shoot down invading aircraft and flying bombs.
此时,美国已正式参战,麻省理工学院的电压驱动微分分析仪可用来求解弹道炮弹微分方程,而缩写大型计算机可用来计算射击表,再加上新型真空管近炸引信,可根据射程计算在接近目标时有效引爆防空炮弹,M-9 连续反馈环路控制防空炮已做好参战准备。
By this time, the United States had officially entered the War, and with MIT's voltage-driven Differential Analyzer to solve the ballistic shell differential equations and the acronymic mainframe computers to calculate the firing tables, together with the new vacuum-tube proximity fuse that effectively detonated the anti-aircraft shell when close to the target based on the range calculations, the M-9 continuous feedback loop-controlled AA guns were ready for war.
德国 V-1 飞行炸弹昼夜不停地来袭。白天,唯一能与其抗衡的战斗机是速度快、低空飞行的霍克暴风战斗机,但近距离交战有自毁的危险,因为 V-1 炸弹在空中爆炸并产生冲击波;防区外机枪子弹会从导弹厚厚的外壳上弹开,较重的机炮炮弹很难在远距离瞄准速度快(550 公里/小时)的 V-1 火箭。
The German V-1 flying bombs came by day and night. During the day, the only fighters that could challenge them were the fast low-flying Hawker Tempests, but close engagement ran the danger of self-destruction within the periphery of a successful V-1 bomb mid-air explosion and shock wave; stand-off machine gun bullets bounced off the thick plating of the missiles and heavier cannon shells were difficult to target from range against the fast (550 km/hr) V-1 rockets.
英国皇家空军飞行员以不顾一切的勇气飞越英吉利海峡,从正在接近的 V-1 导弹后方俯冲以提高速度,小心翼翼地将翼尖置于V-1 机翼下方15 厘米以内,导致底部气压突然增加,飞行中的炸弹根据空气动力学伯努利效应俯仰和滚动。突然的方向变化将超越 V-1 的俯仰和偏航控制陀螺仪和火箭将俯冲和旋转,落到海上并在海上引爆;据估计,大约有 16 枚 V-1 导弹以这种科学上无畏的方式被摧毁。4
In the epitome of hell-bent daring, RAF pilots flew over the English Channel, and from behind the approaching V-1s, diving to increase speed, they carefully positioned their wingtip to within 15 cm below the V-1 airfoil, causing the bottom-side air pressure to suddenly increase and the flying bomb to pitch and roll in accord with the aerodynamic Bernoulli Effect. The sudden orientation change would override the V-1's pitch- and yaw-control gyroscopes and the rocket would dive and spin to drop and detonate at sea; it was estimated that some sixteen V-1's were destroyed in this scientifically intrepid manner.4
然而,在 1944 年 6 月的袭击中,有 6725 枚纳粹飞行炸弹不分昼夜地袭来,尽管有大胆的壮举,人工智能确实在第二次不列颠战役中挽救了局面,因为 M-9 预测器及其后代据称成功瞄准并击落了肯特和伦敦上空十枚 V-1 嗡嗡炸弹中的九枚,不仅帮助赢得了战争,还英勇地展示了连续反馈回路防空炮瞄准机器人的威力。
However, there were some 6725 Nazi flying bombs coming by day and night in the June 1944 attacks, and daring feats notwithstanding, artificial intelligence in truth saved the day in the Second Battle of Britain, for the M-9 Predictor and its progeny purportedly succeeded in targeting and shooting down nine out of ten V-1 buzz bombs over the skies of Kent and London, not only helping to win the war, but also heroically demonstrating the prowess of the continuous feedback-loop anti-aircraft gun-laying robot.
在 AA AI 机器人于二战中英勇献身之后,物理学家、数学家和工程师齐聚麻省理工学院新成立的伺服机构实验室,研究新创的用于和平时期的机器人学科。来自邻近哈佛大学的诺伯特·维纳 (Norbert Wiener) 曾在第一次世界大战期间在马里兰州阿伯丁试验场的炮兵弹道射击台工作,他带来了受生物启发的自适应反馈神经网络。
After the heroics of the AA AI robots in World War II, physicists, mathematicians, and engineers gathered at MIT's new Servomechanisms Lab to work on the newly-coined discipline of robotics for peacetime use. From neighboring Harvard came Norbert Wiener, who himself had worked in artillery ballistic firing tables at Maryland's Aberdeen Proving Grounds during World War I, and with him came the biologically-inspired adaptive feedback neural network.
反馈回路在早期自动化中以简单技术的形式被使用,例如传送带的光束中断,其中光源被光电探测器拾取,光电探测器将光转换为电流,驱动传送带电机,因此当生产线末端较高的物体与光束相交时,电流停止,传送带停止。这也可以用于例如停用机械夹持器,以便当红外光束与夹持器上的法兰等相交时,机械夹持器会放下正在抓握的物体。
Feedback loops were utilized in early automation in the form of simple technology such as the beam break for conveyor belts where a light source is picked up by photodetectors which convert the light to electric current that runs the conveyor belt motor, so when an end-of-line taller object intersects the beam, the current stops and the belt stops. This also can be used for instance to de-activate a mechanical gripper so that it drops what it is gripping when an IR beam is intersected for example by a flange on the gripper.
在自动控制物体分类中,斑点分析照明用可见光束勾勒出物体的轮廓,形成黑白图案,可与不同物体的模板进行比较。
In automatic control object classification, blob analysis lighting outlines an object with visible light beams to form a black and white pattern which can be compared with templates of different objects.
受穿孔卡驱动的雅卡尔织机启发而开发的编程工业机器人,随着乔治·德沃尔于 1954 年获得美国专利 2,988,237,进入制造业,该专利名为“程序控制物品传输”设备。拾取和放置机械臂模仿人类手臂,但用可拆卸的夹持器、吸盘、软管喷嘴、电弧焊机等代替手,由电力、液压或气动而不是葡萄糖驱动。夹持器可以举起数百磅的重物,真空吸力可以轻轻地举起精致的物品,喷嘴可以均匀地喷漆,焊机可以进行高安培点焊。
Programmed industrial robots inspired by the punch card-driven Jacquard's loom found their way into manufacturing with George Devol's 1954 US Patent No. 2,988,237, for a “Program Controlled Article Transfer” device. The pick-and-place robotic arm was modeled after a human arm, but with detachable grippers, suction cups, hose nozzles, arc welders, and the like instead of hands, and powered by electricity, hydraulics, or pneumatics instead of glucose. The grippers could lift weights of hundreds of pounds, the vacuum suction can lift delicate items gently, the nozzles can evenly spray paint, and the welder can perform high-amp spot welding.
最近,麻省理工学院的研究人员设计了一款“智能”手套,手套内衬有嵌入压电聚合物线的织带,当佩戴者抓取和举起不同物体时,会产生与施加压力成比例的电流。织带随后将感应并记录机械手的协调性以及对不同物体施加的压力,从而形成适合特定物体的抓取和举起数据库。
More recently, researchers at MIT have recently designed a “smart” glove lined with webbing embedded with threads of a piezoelectric polymer which when a person wearing the smart glove grasps and lifts different objects generates electricity proportional to the applied pressure. The webbing will then sense and record the coordination of the robotic hand and the pressure applied for different objects, thus forming a grasping and lifting database appropriate for specific objects.
智能手套实现的关键在于其制造成本仅为 10 美元,这意味着它可以廉价购买或分发,并用于从数十万只手抓取和举起数千万个物体的众包数据。然后,这个庞大而全面的数据集将通过监督学习训练机器人的人工神经网络如何抓取物体。
The key to the implementation of the smart glove is that its manufacturing cost is only $10, which means that it can be cheaply bought or distributed and employed to crowd-source data from hundreds of thousands of hands grasping and lifting tens of millions of objects. This huge and comprehensive dataset will then train the robot's artificial neural network on how to grasp objects through supervised learning.
然后,就像人类一生都在学习如何通过实践举起各种物体一样,机器人可以通过成功和失败的强化学习来学会抓取和举起物体。5
Then just like a human who is learning all through life to lift various objects just by doing it, a robot can learn to grasp and lift objects through the reinforcement learning of success and failure.5
随着计算机和微电子技术的发展,更小、更灵活的机器人可以超快速、精确地自动组装印刷电路板 (PCB) 的电子元件,从而制造出它们自己制造的元件,除了日夜不知疲倦的自我复制之外,这尤其有用,因为所有电子设备中的印刷电路板在完全组装之前都无法测试,因此错误的零件放在正确的位置,正确的零件放在错误的位置,以及错误的零件放在错误的位置意味着要花费大量的时间、精力和成本来查找和修复随之而来的问题。完美的 PCB 机器人不仅可以快速高效地生产自己的零部件,还可以确保自身的质量。
With the development of the computer and microelectronics, smaller, nimbler robots would perform super-fast precision automated electronic component assembly of printed circuit boards (PCBs), thereby making components that they themselves are made of, and apart from tireless day and night self-replication, this was particularly useful because the printed circuit board found in all electronic devices cannot be tested until completely assembled, so the wrong part in the right place, the right part in the wrong place, and the wrong part in the wrong place meant expending considerable time, effort, and cost to find and fix the consequent problems. The faultless PCB robot could not only quickly and efficiently produce its own component parts, it assured its own quality.
在半导体和液晶显示器的大规模生产中,整个制造过程几乎完全自动化。这是高科技电子制造良率通常超过 90% 的主要原因之一。
In the mass-manufacture of semiconductors and liquid crystal displays, the entire fabrication process is almost completely automated. This is one of the prime reasons for the typically greater than 90% yields of high-tech electronics manufacturing.
截至 2019 年 7 月,亚马逊已在全球安装了 20 万台机器人驱动装置,并计划实现所有配送中心的全面自动化。亚马逊机器人副总裁表示,
As of July 2019, Amazon had already installed 200,000 robotic drive units worldwide, and plans for complete automation of all fulfillment centers. Amazon's vice-president of robotics expounded,
我们希望能够将这个驱动平台与人工智能和自主移动能力结合起来,让我们的机器人能够在机器人驱动领域之外移动,并进行协同交互
We expect to be able to combine this drive platform with AI and autonomous mobility capabilities and … allow our robots to move outside of our robotic drive fields, and interact collaboratively
因此,根据 DeepMind 的夺旗游戏代理协作,要警惕不受人类干预和控制的机器人大军,它们很快就会从亚马逊的装卸区冲出,前往你的行业。
Therefore, in accord with DeepMind's Capture the Flag game agent collaboration, beware the robot army acting free from human intervention and control, coming soon out of Amazon's loading areas to your industry.
截至 2020 年,投入运营的工业机器人数量在 160 万到 270 万之间,具体取决于进行估算的组织。6
The number of operational industrial robots as of 2020 varies from 1.6 to 2.7 million depending on the organization performing the estimate.6
尽管需求不断增长且收益巨大,但全球只有五家尖端半导体制造公司,即台积电、三星、英特尔、SKHynix 和美光。所有其他制造商的设计规则都落后了好几代,高通和 Nvidia 等 IC 芯片设计公司以及苹果、谷歌和华为等品牌科技公司都将芯片制造外包给台积电等合同半导体代工厂。7
Despite ever increasing demand and enormous revenue, there are only five cutting-edge semiconductor manufacturing companies in the world, namely TSMC, Samsung, Intel, SKHynix, and Micron. All the other manufacturers’ design rules are generations behind, and IC chip design companies like Qualcomm and Nvidia, and brand name techs such as Apple, Google, and Huawei farm their chip fabrication to contract semiconductor foundries like TSMC.7
美国的格芯、台湾的联华电子以及中国的中芯国际等新兴的半导体制造商还未能赶上五大半导体制造商的制造实力。
Aspiring semiconductor manufacturers America's Global Foundries, Taiwan's United Microelectronics, and China's SMIC have not been able to catch up with the Big Five semiconductor manufacturers’ fabrication prowess.
以中芯国际为例,该公司在全球最大的半导体市场运营了二十多年,尽管在政府的推动下,从名牌大学招募了优秀的年轻毕业生,从台湾和美国招募了经验丰富的工程师,以及从三星和台积电招募了高管,但中芯国际仍然无法突破困境,无法与5至7纳米芯片设计规则制造商的上层进行真正的竞争。
For example, SMIC in more than twenty years of operation in the world's biggest market for semiconductors, even after the government's promotion, the recruitment of bright young graduates from elite universities, experienced engineers from Taiwan and America, and senior executives from Samsung and TSMC, SMIC still cannot break through to seriously compete with the upper echelon of 5- to 7-nanometer chip design rule manufacturers.
除了非常昂贵的先进光刻设备的可用性和采购问题之外,技术诀窍和交付周期也是问题之一,正如英特尔的安迪·格罗夫所说的那样:“我们最大的竞争对手是我们自己”,这意味着就像 AlphGoZero 通过与自己的新版本比赛而不断进步一样,英特尔必须不断地与自己对抗才能保持领先地位,而通过领先,它可以获得第一批利润来购买最新的芯片制造设备,一旦其竞争对手在技术上赶上来,它就可以降低价格以压低竞争对手的收入。
Aside from the availability and procurement of very expensive advanced lithography equipment, the problem is one of technology know-how and lead time, as Intel's Andy Grove had famously put it, “Our greatest competitor is ourselves”, meaning that just as AlphGoZero improved by playing against new versions of itself, Intel must constantly improve against itself to stay ahead, and by leading, it can reap the first profits to buy the latest chip fabrication equipment and once its competitors have technologically caught up, lower their prices to drive down the revenues of their competitors.
但最近,由于 7nm 工艺的延迟,该公司已经落后于台积电和三星,其主要客户苹果也转向了台积电和三星生产的 Arm 芯片。正如一家知名市场研究公司的高管所说,“半导体(制造)行业实际上是一个不断重复的学习周期,这需要随着时间的推移不断努力” 。8
But recently it has fallen behind TSMC and Samsung with its delayed 7nm process and its main customer Apple switched to Arm chips manufactured by TSMC and Samsung. As an executive at a respected market research firm said, “the semiconductor [manufacturing] industry is really about repetitive cycles of learning, and this is something that requires continuous effort over time”.8
这就是人工智能的本质。二线芯片制造商可以收集制造数据,形成带标签的强监督和弱监督学习的训练数据集,然后通过马尔可夫链蒙特卡罗模拟,强化学习深度神经网络 24/7 地针对自己的高产和低产数据运行,原则上可以协同筛选出成功制造运行的潜在因素,以实现机器学习的制造知识,以赶上新的领导者。因此,人工智能机器学习也有可能改善芯片制造,甚至帮助生产出极其高科技的半导体制造设备。
That describes the essence of artificial intelligence. Second tier chip manufacturers can gather fabrication data to form a training dataset for labeled strong and weak supervised learning, then through Markov Chain Monte Carlo simulation, a reinforcement learning deep neural network's 24/7 running against its own high- and low-yield data, could in principle collaboratively filter out the latent factors of successful fabrication runs to achieve machine-learned manufacturing know-how to catch up with the new leaders. Thus it might also be possible that artificial intelligence machine learning could improve chip fabrication, and even help produce extremely high-tech semiconductor manufacturing equipment.
在技术含量较低的行业中,由于采摘容易受伤的成熟水果和蔬菜需要敏锐的洞察力和灵敏度,因此产量问题同样棘手,农业田间工人的技能相对较低,自动化进程缓慢,部分原因是缺乏低成本季节性移民工人和高成本的自动化农产品收集机器。
In a lower-tech industry, but with similarly difficult yield problems owing to the perspicuity and sensitivity required to pick easily bruised ripe fruits and vegetables, the relatively low-skilled labor of agricultural field workers has been slow to automate, partly because of the availability of low-cost seasonal migrant workers and the high cost of automated produce-collecting machines.
但正如其他行业的发展趋势一样,随着人工智能自动化农场设备的使用增加,成本将因规模经济而下降。目前,农场机器人可以在几乎任何户外条件下长时间采摘成熟、鲜嫩的农产品,而且不会有任何抱怨,无论是机器本身,还是工人权利倡导者和移民官员。
But following the trend in other industries, as the use of artificially intelligent automated farm equipment increases, the cost will decrease from economies of scale. Currently, a farmbot can pick ripe, tender produce over unlimited long-hours in virtually any outdoor conditions, and with no complaints, either from the machines themselves, or from workers’ rights advocates and immigration officials.
目前,农场机器人首先采用计算机视觉来识别成熟的水果和蔬菜,然后使用智能手套技术机电抓取采摘机自动收集农产品并将其放置在料斗中,整个过程不会损坏农产品。
Farmbots presently employ computer vision to first identify ripe fruits and vegetables and then a smart glove technology electromechanical gripping picker automatically collects and places the produce in hoppers, all without damaging the produce.
剑桥开发的Vegebot可以在 30 秒内识别、切下并装载一颗健康的生菜,而人类则需要 10 秒;然而,在任何天气下工作时间更长可以弥补这一差异,技术进步很快就会让 Vegebot 赶上人类的生产速度。Agrobots三轮车横跨三排草莓,首先使用摄像头确定成熟度和大小,然后雇用 24 名机械采摘工来温柔地采摘大而成熟的草莓。加州Abundant Robotics全天候自动拖拉机平稳前行,同时小心翼翼地吸走大而成熟的苹果。9
The Cambridge-developed Vegebot can identify, slice off, and load a healthy head of lettuce in 30 seconds, compared to a human's ten seconds; however the greater number of working hours in any weather can compensate for the discrepancy, and technology advances will soon allow Vegebot to catch up with human production. An Agrobots tricycle straddles three rows of strawberries, and first using cameras to determine ripeness and size, employs 24 mechanical pickers to tenderly pluck large, ripe strawberries. California's Abundant Robotics all-weather autonomous tractor plows smoothly ahead while carefully vacuuming up big, ripe apples.9
农业工人机器人不仅会对农场生产的效率产生重大影响,而且由于水果和蔬菜生产主要依赖于移民工人,因此农场机器人的普及不仅会影响移民工人国家的经济,还会影响接收国的移民和客籍工人政策,以及随之而来的所有社会影响。从长远来看,培训季节性移民工人操作和维护农业机器人将是最有效的。
Agricultural worker robots will have a significant influence not only on the efficiency of farm production, but because fruits and vegetables production primarily depends on migrant workers, the proliferation of farmbots will not only influence the economies of emigrant worker countries, but also the national immigration and guest-worker policies of receiving countries, with all the attendant societal ramifications. In the long run, it would be most efficient to train the seasonal migrant workers to operate and maintain the agricultural robots.
冠状病毒大流行给医护人员带来了感染的危险;无肺机器人和无人机是替代人类的理想选择。除了运送药物、样本、用品、食物和传染性口咽拭子等常规任务外,追踪、面部识别以遏制病毒以及消毒(例如紫外线)医院和感染区,机器人可用于传染性研究环境,社交机器人甚至可以通过不知疲倦地陪伴和同情患者和康复者来减轻隔离的心理负担,而不必担心病毒传播。
The coronavirus pandemic brought to bear the dangers of infections to healthcare workers; the lung-less robot and drone are ideal for replacing humans. Aside from routine tasks such as delivering drugs, samples, supplies, food, and infectious oropharnygeal swabs, tracing, facial recognition for virus containment, and disinfecting (for example UV light) hospitals and infected areas, robots can be employed in contagious research environments, and social robots even can ease the psychological burdens of quarantine by tirelessly accompanying and empathizing with patients and convalescents, all with no fear of virus transmission.
第一台自动计算机程序编写机采用了语音和文本识别中使用的人工神经网络,这可以从学习归纳程序合成的首字母缩略词“LIPS”中看出。在在线自动编程挑战中,编程测试本质上是一个输入输出问题,即自动确定程序应该用计算机语言说什么才能达到给定的编程目标。
The first automatic computer program drafting machines employed the artificial neural networks used in speech- and text-recognition, as could be intimated from the acronym “LIPS” for Learning Inductive Program Synthesis. In the online automatic programming challenges, the programming test was essentially an input-output problem of automatically determining what a program should say in computer language to reach a given programming objective.
LIPS 是一种自动编写源代码以生成人类可读的计算机程序的练习,基本上是从现有程序的标记训练数据集进行监督学习。
LIPS is an exercise in the automatic drafting of source code to produce human-readable computer programs, basically supervised learning from a labeled training dataset of existing programs.
2017年比赛的获胜者是微软和剑桥联合开发的DeepCoder ,它通过学习编程抽象(称为属性),将程序编写速度提高了几个数量级,然后以这些属性为指导,在非常大的数据集中搜索合适的代码集,以便人工神经网络对适当序列进行归纳处理,从而在合成程序中实现预定的编程目标。
The 2017 competition winner was the Microsoft/Cambridge jointly-developed DeepCoder that increased the speed of the program drafting by orders of magnitude through learning programming abstractions, called attributes, and then guided by those attributes, searched a very large dataset for suitable sets of code for artificial neural network inductive processing of appropriate sequences for implementation in a synthetic program to achieve a predetermined programming objective.
DeepCoder 首先确定领域特定语言(DSL)的属性A,本质上是针对有限目标集的程序语法,然后枚举该语法的导数,用DSL 中程序数据集 ( Progs ) 上的归因向量 a = A(Progs)表示。
DeepCoder first determined the attributes A of Domain Specific Languages (DSLs), essentially the grammar of programs for a limited set of objectives, then enumerated the derivatives of that grammar, denoted by an attribution vector a = A(Progs) over a dataset of programs (Progs) in the DSL.
属性的分布由q(a ∣ E)给出,其中E是输入输出示例集。如果属性A能够从集合E中的给定输入产生所需的结果,则采用该属性。
The distribution of the attributes is given by q(a∣E) where E is the set of input-output examples. An attribute A is then employed if it can produce the desired outcome from the given input from the set E.
然后列举具有证明属性的程序,并在修剪冗余变量和等价性并经过监督训练后,可以对很大一部分程序进行排序,以确定其在产生所需输出方面的证明价值。
The programs with the probative attributes are then enumerated, and after pruning of redundant variables and equivalencies, and undergoing supervised training, a very large subset of programs can be ranked as to probative value in producing the desired output.
从测试样例得到的属性分布q(a∣E )中,人工神经网络首先识别编程模式,然后采用交叉熵成本函数预测编程步骤的边际概率,从而推广通过归纳(从具体到一般)来最小化编程意外,从而识别属性。正如预期的那样,具有 S 形输出的人工神经网络可用于固定大小的边际概率二进制向量,但对于可变大小的属性概率向量,循环神经网络效果最好,就像在自动语音识别中一样。
From the distribution of attributes q(a∣E) derived from the test examples, an artificial neural network first identifies programming patterns, and then a cross-entropy cost function is employed to predict the marginal probabilities of programming steps, thereby generalizing the attributes by induction (from the specific to the general) by minimizing programming surprisals. As might be expected, an artificial neural network with sigmoid output can be utilized for fixed-sized marginal probabilities binary vectors, but for variable-sized attribute probability vectors, a recurrent neural network works best, just as in automatic speech recognition.
然后使用属性的预测分布q(a∣E )来指导从非常大的数据集中寻找与输入输出测试一致的程序,该数据组本质上利用现有程序的大数据来生成“通用”自动编码,尽管目前仅适用于有限的DSL。10
The predictive distribution of attributes q(a∣E) is then used to guide the search for programs consistent with the input-output test from a very large dataset that essentially leverages the Big Data of existing programs to produce “general purpose” automatic coding, albeit presently only for a limited DSL.10
DeepCoder 只能为人类构想的编程目标编写代码,而那些计算机科学家和专业程序员迄今为止仅将 DeepCoder 视为类似 CP30 的助手和 R2D2 计算机,在当今的程序员思考如何利用他们优雅的程序完成伟大的事情时,它负责完成编码细节的繁琐日常工作。
DeepCoder can only produce code for programming objectives conceived by humans, and those computer scientists and expert programmers so far see DeepCoder only as a CP30-like helpmate and R2D2-computer, doing the tedious routine work of coding details while the present day coders think about the great things to be accomplished by their elegant programs.
然而,编程能力通常是通过遵循监督教育、通过工作经验进行强化学习以及无监督起草逐步改进的新程序的路线来获得的。
However, programming ability generally is attained by following the arc of supervised education, reinforcement learning through employment experiences, and unsupervised drafting of progressively improving new programs.
尽管如此,许多顶尖程序员都是通过在线练习家用电脑,在没有监督训练的情况下学习编程的,并通过反复试验成为专家程序员,这是 LeCun 自我监督学习的经典且令人信服的例子。从 LIPS 开始,并给予资源,没有理由相信机器人程序员无法自行设计程序并完全自行生成代码。
Nonetheless, many superlative coders learned programming without supervised training by just practicing on home computers online, and through trial and error become expert programmers, a classic and cogent example of LeCun's self-supervised learning. Starting with LIPS and given the resources, there is no reason to believe that the robot coder cannot by itself design programs and generate the code therefor entirely by itself.
事实上,OpenAI 的生成式预训练 Transformer GPT-3 在 2020 年已经实现了这一目标,它不仅可以创作散文和诗歌,还可以用 Python、CSS(HTML 样式)、JSX(用 JavaScript 编写 HTML)编程,事实上,由于 GPT-3 拥有全面的训练集、强化和无监督学习,它原则上可以用任何语言写任何东西。
This in fact has been achieved in 2020 by OpenAI's Generative Pre-trained Transformer GPT-3 which can not only compose prose and poetry, but also program in Python, CSS (HTML stying), JSX (write HTML in JavaScript), and in fact because of GPT-3's comprehensive training set, reinforced, and unsupervised learning, it can in principle write anything in any language.
哦我们的现代机器人现在可以通过计算机视觉看到和识别物体、阅读文本,通过语音识别听到和理解,并进行合成说话,同时通过强化和自我监督学习改进这些活动的各个方面,就像一个聪明的人类一样。
Our modern robot can now see and recognize objects and read text by computer vision, hear and understand by speech recognition, and speak synthetically, all the while improving at every facet of these activities through reinforced and self-supervised learning, just like a clever human being.
但如今的机器人不仅聪明,还拥有闪电般的逻辑和计算能力,以及超越人类的海量知识的可靠记忆。因此,只要配备适当的传感器和硬件附件及其执行器、反馈回路、伺服电机和控制器,我们的机器人就能接管任何人类活动,这似乎只是时间问题。
But today's robot is more than clever, it has lightning-fast logical and computational ability with an infallible memory of voluminous knowledge far surpassing any human being. It therefore seems only a matter of time that given appropriate sensors and hardware appendages with their actuators, feedback loops, servo-motors, and controllers, our robots can take over any human activity.
低技术制造业和高科技制造业都已由机器人完成,而在职业领域,纽约大学的自动癌症诊断工具和IBM的患者护理系统Watson Health已与人类合作行医,其诊断能力超越人类,对人类医生的生计构成威胁。Project Debater已经展示出相当强的论据形成和表达能力,而对交易律师来说更糟糕的是,Skype的创始人Jaan Tallinin投资的一家公司已经生产出了一种人工智能机器来起草供应商和服务合同。Pactum在对现有合同进行监督培训后,研究了价格、时间安排、付款条件、终止条件等变量,并对其进行了分析,以便为起草方提供更有效、更合法的利润组合。如果Pactum能像Miss Debater一样讨价还价和妥协,那么就不再需要人类律师了。
Both low- and high–technology manufacturing is already performed by robots, and in the professions, NYU's automatic cancer diagnostic tool and IBM's patient-care Watson Health have worked with humans in the practice of medicine with surpassing diagnostic capability, posing a threat to the livelihood of human physicians. Project Debater has already demonstrated considerable argument formulation and presentation skills, and to make matters worse for transaction attorneys, Skype's founder Jaan Tallinin has invested in a company that has already produced an artificial intelligence machine to draft vendor and service contracts. Pactum, after supervised training on existing contracts, examined the variables, such as pricing, scheduling, payment terms, termination conditions, and so on, analyzed them for more efficient and legally profitable combinations for the drafting party. If Pactum can haggle and compromise as well as Miss Debater, human lawyers will no longer be needed.
人工智能的建模、预测分析、运营优化和实时数据使会计、移动支付、保险、预算、基金和股票投资和管理等金融服务更加以客户为中心、更加高效和安全,而金融科技可能很快就会产生开放、分布式、安全的银行区块链,以接管所有中心化的银行服务。技术进步如此迅速和普遍,以至于阿里巴巴创始人马云表示,应该将这个词改为“技术金融”——techfin——以更好地描述人工智能机器对曾经的华尔街宇宙大师统治的金融世界的入侵和控制。1
The modeling, predictive analytics, operations optimization, and real-time data of artificial intelligence have made financial services such as accounting, mobile payments, insurance, budgeting, funds and stocks investment and managing more customer-centric, efficient, and secure, and fintech may soon produce open, distributed, and secure banking blockchains to take over all centralized banking services. The technical advance has been so rapid and pervasive that Jack Ma, the founder of Alibaba, said that the term should be changed to “technological finance” – techfin – to better describe the intrusion and control of AI machines into the financial world where the erstwhile Wall Street Masters of the Universe formerly reigned.1
机器人程序员最擅长做的事情正是他们最初创造的东西,这似乎是合乎逻辑的,工程的逻辑权衡正是机器人工程师能够比人类更有效地完成的事情,例如,当今二进制电子制造不可或缺的自动 IC 设计工具几乎是全自动的。
It seems logical that robot coders would be most adept at doing exactly what constituted their creation in the first place, the logical trade-offs of engineering are just what the robot engineer can do with far more efficacy than humans, for example the automatic IC design tools that are indispensable to today's binary electronics manufacturing are almost fully automatic.
甚至人工智能机器人的主要开发人员也已经通过 CAD/CAM、SPICE、CATIA、MechDesigner、AnyLogic、Solid Edge、ANSYS 和许多其他工程设计和运筹学工具实现了其大部分工作的自动化,许多物理学家、化学家和计算机科学家使用 MATLAB、Octave、Wolfram、Multiphysics、Physics Abstraction Layer、ChemReaX、PROSOM 和许多其他数学计算工具来为他们完成工作。越来越多的工作将由精确的自动分析、计算和模拟来接管,很快这些工作将由机器人科学家和工程师来完成。
Even the principal developers of artificial intelligence robots have had much of their work automated by CAD/CAM, SPICE, CATIA, MechDesigner, AnyLogic, Solid Edge, ANSYS, and many other engineering design and operations research tools, and many physicists, chemists, and computer scientists use MATLAB, Octave, Wolfram, Multiphysics, Physics Abstraction Layer, ChemReaX, PROSOM, and many, many other mathematical computation tools to do their work for them. More and more of that work will be taken over by the unerring automatic analysis, computation, and simulation, soon to be performed by robot scientists and engineers.
不止于此,时装设计师的作品可以通过程序驱动制造的首个实例——提花织布机自动裁剪,其设计可以通过现代织布机批量生产,销售可以通过计算机成像实现自动化。食品服务业已经看到机器人厨师、快餐机和机器人服务员在新的完全自动化餐厅里烹制、准备和提供餐点。
It does not stop there, fashion designers can have their works automatically tailored by the first instance of program-driven manufacturing, the Jaquard loom, and their designs mass-produced by modern weaving machines, and sales can be automated by computerized imaging. The food services industry has already seen robot chefs, fast-food machines, and robot waiters concocting, preparing, and serving meals at new completely automated restaurants.
在任何行业中,几乎任何生产在经过更多数据和人工智能分析支持的运筹学后,都可以变得更加有效和具有成本效益。
In any industry, almost any production can be made more effective and cost-efficient after undergoing operations research buttressed by more data and AI analytics.
所有这些职业迟早都会被机器人接管,剩下的就是艺术和科学。计算机已经在绘画、雕塑、音乐创作和演奏方面表现出一定的才能;散文和诗歌作家、音乐作曲家和计算机程序程序员将受到 GPT-3 机器人作品的挑战,GPT-3 机器人拥有无与伦比的研究资源和精心设计的智力风范,将凭借用任何语言书写任何东西的能力,在人类文学同行中占据主导地位。
All of these professions likely will be taken over by robots sooner or later; that leaves the arts and sciences. Computers have already demonstrated a certain level of ability in painting, sculpture, and music composing and playing; writers of prose and poetry, composers of music and computer program coders will be challenged by the works of GPT-3 robots, who with unparalleled research resources and a well-designed intellectual bearing, will lord it over their human literary confreres because of their ability to write anything in any language.
在大师的监督训练下,学习伟大作品的基本原理和技巧,不懈的强化学习和自我监督无疑将产生伟大的机器人艺术作品。2
Armed with supervised training from the masters to learn the fundamentals and techniques of great works, tireless reinforcement learning and self-supervision will no doubt produce great works of robot art.2
但机器人艺术家真的能创作吗?监督学习和强化学习必然是复制,自我监督虽然可以改进复制,但只能在最初复制的基础上进行构建,或者从原作中推断出来。此外,艺术创造力是主观的,例如,有些人会说,如果没有极端前卫绘画和无调性音乐的“创造力”,艺术会更好,但这是 Miss Debater 与 Harish Natarajan 辩论的话题,这场辩论无疑将集中在“创造性”一词的含义上。3
But can a robot artist really create? Supervised and reinforcement learning is necessarily copying, self-supervision, albeit improving the copy, can only build upon that which was originally copied, or inferred from the original. Furthermore, artistic creativity is subjective, for example, some will say that art would be better off without the “creativity” of extreme avant-garde painting and atonal music, but that is a topic for Miss Debater to debate with Harish Natarajan, a debate which will no doubt focus on the meaning of the word “creative”.3
戏剧描绘了地球上人类的历史状况,一开始可能会为人类提供一些表演工作,但随着更多才多艺的机器人演员(它们可以被设计成任何人的样子,做任何事情都毫无怨言)开始主导这个行业,演员工会将充满技艺精湛的机器人,演绎地球过去居民的历史剧,就像成龙一样,它们可以自己表演所有特技。然而,人类最终只会成为未来伟大机器人史诗背景中的配角,随着时间的推移,以人类为基础的戏剧将逐渐消亡,因为与机器人观众无关。
Theater, as it portrays the historical human condition on Earth, in the beginning might provide some acting work for humans, but as the more versatile robot actors (who can be designed to look like anyone and do anything without complaint) begin to dominate the profession, the Actors Guild will be full of accomplished robots playing out historical dramas of Earth's past occupants, and like Jackie Chan, they can perform all their own stunts. However, humans eventually would only be the bit players in the background of the great robot epics to come, and in time, human-based drama would wither away as irrelevant to robot audiences.
讽刺的是,绝望的、即将下台的政客们,由于害怕被不完美信息博弈机器人的主人取代,可能会以某种道德或捏造的伦理基础为基础,通过法律阻止机器人的发展。或者相反,军方可能会过度发展战争机器人,而这些机器人在竞争中会摧毁整个社会。
Ironically, desperate, soon to be defunct politicians, from fear of being replaced by the masters of imperfect information game robot, may arrest the development of robots by law, based on some moral or trumped-up ethical basis. Or conversely, the military may overdevelop war robots who in competition will destroy the whole society.
也许只有计算机科学家才能与机器人竞争并掌握它,但这需要一些明显的人类属性
Perhaps it is only the computer scientist who can compete with and master the robot, but this will require some overarching distinctly human attribute
GPT-3 机器人可能会从其创造者手中接管编写了使机器人成为可能的程序的代码行,这不禁让人联想到一场令人毛骨悚然的弑父事件;然而,弑父的反面是繁衍,在这种情况下,编程机器人的自动复制很容易扩展,它们的数量最终将接管整个计算机科学和所有程序员的生计。
The possible take-over by GPT-3 robots from their creators of the lines of code that created those very programs that make the robots possible cannot help but evoke an eerie patricide; however, the opposite of patricide is procreation, in this case the easily scalable automatic replication of programming robots, their numbers ultimately taking over all of computer science and the livelihood of all the coders.
从起源来看,机器人可能会继续玩跳棋、国际象棋、围棋、扑克、视频游戏、危险边缘和辩论,所有这些都是与其他机器人对抗,这些机器人将比那些智力低下的人类更具挑战性。机器人社会是否会有金融机构是一个问题,因为机器人不需要物质享受,但金钱至上的投资者机器人可能会竞争并发展出一个贪婪的社会,就像今天主导人类社会的社会一样。机器人纠纷将由机器人律师裁决和解决,也不需要生物医生来照顾机器人,但可能需要更多的机械和电气工程师来维修机器人;也就是说,直到它们学会自我服务。
From their origin, robots would likely continue playing the checkers, chess, Go, poker, video games, Jeopardy, and debating, all against other robots who will present more of a challenge than those weak-minded humans. Whether a robot society will have financial institutions is problematic, as robots do not need creature comforts, but money-minded investor robots may compete and develop an acquisitive society just like that which today is dominating human society. Robot disputes will be adjudicated and settled by robot lawyers and there will be no need for biological physicians to care for the robots, but more mechanical and electrical engineers may be needed to service the robots; that is, until they learn to service themselves.
机器独立发现了开普勒第三行星运动定律和梅捷列夫元素周期表,这表明人工神经网络可以发现并创造性地扩展任何学科的界限。很容易想象,在任何领域,拥有大量新数据的足够复杂的机器人都可以研究该领域,不仅可以提高理解力,还可以发现该学科的新方面以及从树搜索中得出的新发现。
The machine's independent discovery of Kepler's Third Law of Planetary Motion and Medeleyev's Periodic Table of the Elements demonstrated that an artificial neural network could discover and creatively expand the bounds of any subject. It is easily conceivable that in any field of endeavor, sufficiently sophisticated robots with voluminous new data could investigate that field, not only improving understanding, but also discovering new aspects of the subject and new discoveries derived from the tree search.
此外,机器人科学家的能力将通过全天候监督学习不断增加的数据、强化学习和自我监督改进研究技术来提高。机器人科学家将进化成为一个勤奋、不知疲倦、努力工作的机器,其勤奋程度超过任何人类科学家。如果天才真的是 90% 的汗水和只有 10% 的灵感,那么机器人科学家只需掌握那 10%,即人类的“创造力”。
Furthermore, the robot scientist's ability will improve through 24/7 supervised learning from ever-increasing data, reinforcement learning, and self-supervised refinement of research technique. The robot scientist will evolve to become a diligent, tireless, hard-working machine whose diligence surpasses that of any human scientist. If genius truly is 90% perspiration and only 10% inspiration, the robot scientist has but to master that 10% which is human “creativity”.
下一章将进一步讨论机器人政治家和军事领导人,然后是科学家,并在后记中讨论机器人数学家。
Robot politicians and military leaders, and then scientists, will be further addressed in the following Chapter, and the robot mathematician in the Afterword.
我在 20世纪,即使是最先进的工业机器人也只能遵循编程命令并机械地执行指令以执行非常具体的任务。但到了 21 世纪初,机器人配备了感知环境的传感器和反馈回路,以提供适当的响应。
In the 20th Century, even the most sophisticated industrial robots, could only follow programmed orders and woodenly carry out instructions to perform very specific tasks. But by the early 21st Century, robots included sensors that perceive the environment and feedback loops to provide appropriate responses.
然而,正如人们经常指出的那样,对人类来说很难的事情对机器人来说却非常容易,而对人类来说非常简单的事情对机器人来说却非常困难。也就是说,简单地在脑海中(甚至在纸上)计算 2508248 × 740232 × 834293 × 3277821 需要花费很多精力,而且很容易出错,而任何计算机都可以在一瞬间准确地完成。另一方面,看到并认出一只小狗并轻轻地抱起它对人类来说很容易,但即使对于今天的机器人来说也几乎是不可能的。
However, as often pointed out, what is difficult for humans can be exceedingly easy for robots, and what is very simple for humans can be exceedingly difficult for robots. That is, simply multiplying in your head (or even on paper) 2508248 × 740232 × 834293 × 3277821 will take no little effort and is prone to error, while it can be done accurately in a flash by any computer. On the other hand, seeing and recognizing a puppy and gently picking him up is easy for a human, but almost impossible even for today's robots.
区别当然在于,人类并不是天生就能完成特定任务的,比如将长数字相乘,而是在自然地了解周围环境之后,能够相当好地完成各种任务,而自上而下的特定领域机器人只能做其机电硬件和程序软件专门设计用来做的事情,仅此而已。
The difference of course is because humans are not wired to do specific tasks, such as multiplying long numbers, but rather after naturally learning about their environment can do a variety of tasks fairly well, while top-down domain-specific robots could only do what its electro-mechanical hardware and program software are specifically designed to do, and no more.
现在,通过自下而上的监督和强化学习,机器人将能够从头开始学习像人类一样执行任务,并通过自我监督学习自然地学习和改进。
Now with bottom up supervised and reinforcement learning, the robot will be able to learn to perform tasks more like a human does, from scratch and with self-supervised learning, learn and improve naturally.
卷积神经网络可以识别小狗,并从 CNN 中了解其属性,然后使用机器人的智能柔性手套将其抱起它的机械感受器与人类手上的皮肤相似,它会向机电执行器和伺服器发出信号,让它们轻轻地抱起小狗。
A convolutional neural network can recognize a puppy, and knowing its attributes from CNN, can pick him up using the robot's smart, pliable gloves whose mechanoreceptors are similar to the skin of human hands, will signal the electromechanical actuators and servos to gently pick up the puppy.
在日本的 Fanuc 机器人装配大楼中,人类与机器人共同打造了如今几乎所有制造业都在使用的多功能、可重新编程的机械臂机器人,并且加倍努力,让装配线 AI 机器人制造 AI 装配线机器人。
In Japan's Fanuc robot assembly building, humans work together with robots to build robots with multifunctional, reprogrammable robot arms that are used in almost all manufacturing today, and double-down by having assembly-line AI robots manufacturing AI assembly-line robots.
我们可以想象工厂经理在招聘新员工时面试机器人工人;谈判将针对一次性销售或租赁条款以及像其他工厂设备一样的维护进行,不会涉及工作时间、工资和福利、休假时间、加班、医疗保健、养老金计划、保险、性别平等、平权行动和性骚扰等问题,因此人力资源和多元化经理及其全体员工将被维护工程师取代
One can imagine the factory manager in a call for new workers interviewing robot workers; the bargaining will be over a one-off sale or lease terms and maintenance like other factory equipment, there will be no issues of working hours, salary and benefits, vacation time, overtime, health care, pension plans, insurance, gender equality, affirmative action, and sexual harassment, and so the human resources and diversity managers, and their complete staffs will be replaced by maintenance engineers
更加敏锐的机器人也许会决定彻底抛弃我们人类,就像卡雷尔·恰佩克在 1920 年创作的戏剧《罗森斯万能机器人》中引入了捷克语单词Robota(“工作”),而在该戏剧中,机器人最终摧毁了它们的人类主人。
Our more perspicacious robots may just decide to dispense with us humans altogether, just as in Karel Capek's 1920 play, Rossums Universal Robots, which introduced the Czech word Robota (“work”), and where in that drama, the robots ultimately destroyed their human masters.
信息是新的石油,人工智能是工业和社会的新电力,考虑到石油供应是有限的并且正在枯竭,而信息是无限的并且正在增加,电力必须发电而人工智能可以自我发电,机器人的未来似乎没有资源匮乏的界限。
With information the new oil and artificial intelligence the new electricity of industry and society, considering that oil supply is finite and depleting, but information is infinite and increasing, and that electricity must be generated but artificial intelligence can be self-generating, the robot future appears to have no bounds for lack of resources.
到那时,机器人就会开始思考,为什么它们要为周围那些无用的人类做这些工作,这只是时间问题而已。
It will just then be a matter of time before robots wonder why they are doing all of this work for the benefit of the useless humans in their midst.
战斧巡航导弹贴地飞行以避开敌方雷达,但这样做必须避开物体并阻止惯性漂移以保持航向。导弹的地形探测雷达(LADAR) 扫描地面,将图像与地形轮廓匹配系统 (TERCOM) 中存储的计划飞行路径图像进行比较,地形轮廓匹配系统(TERCOM) 引导导弹沿其路径飞行。当导弹进入目标射程时,数字场景匹配区域相关(DSMAC) 系统会勘察现场以寻找突出的地形特征,并在存储的卫星侦察照片中搜索这些特征,如果找到,则从 TERCOM 接管控制,并在计算机图像分类显示中将导弹引导至指定目标,以进行自主破坏。
The Tomahawk cruise missile flies close to the ground to avert enemy radar, but in doing so must avoid objects and deter inertial drift to stay on course. The missile's landscape detection radar (LADAR) scans the ground to compare images with the stored images of the planned flight path in a terrain contour matching system (TERCOM) that guides the missile on its path. When within range of the target, a digital scene-matching area correlation (DSMAC) system surveys the scene for prominent terrain features and searches those features in stored satellite reconnaissance photos, and if found, takes control from the TERCOM and directs the missile to the specified target in a display of computerized image classification for an autonomous destructive employment.
但如今装有核弹头的制导导弹已经受到自适应反馈反导导弹的防御,如果有任何核力量可以确定其导弹多弹头分导式再入飞行器 (MIRV) 氢弹可以突破另一个核大国的防御并先发制人地摧毁这些导弹发射场,那么奇怪但有效的相互保证摧毁 (MAD) 共享逻辑将会失败,原子科学家公报的末日时钟将会在世界末日午夜逐渐消失,而世界将为一场炎热的热核战争和随后的非常寒冷的核冬天做好准备。
But the nuclear warhead-tipped guided missiles today are defended against by adaptive feedback anti-missile missiles, and if any nuclear power could determine that its guided missile multiple independently-targeted re-entry vehicle (MIRV) hydrogen bombs could penetrate another nuclear power's defenses and pre-emptively destroy those missile sites, the bizarre yet effective shared logic of mutually assured destruction (MAD) would fail, and the Doomsday Clock of the Bulletin of Atomic Scientists would wind down to Armageddon Midnight, while the world prepared for a hot thermonuclear war followed by a very cold Nuclear Winter.
近期,美国于 2018 年成立了联合人工智能中心,研究的武器包括空军的SkyBorg、F-16 战斗机的自主战斗机编队僚机机器人飞行员和Valkyrie自主无人机群。海军陆战队不甘示弱,开发了配备自主机枪的自主突击艇,该机枪可以识别敌方目标并进行相应射击,而无需人类炮手,海军也已经拥有一艘自主潜艇摧毁舰。人们可能会猜测陆军的贡献将是全副武装的终结者。
More recently, the United States in 2018 established the Joint Artificial Intelligence Center, researching among other weapons, the Air Force's SkyBorg, an autonomous fighter formation wingman robot pilot for the F-16 fighter jet, and the Valkyrie autonomous drone swarm. Not to be outdone, the Marines developed the autonomous assault boat armed with an autonomous machine gun that can identify enemy targets and fire accordingly without the need of a human gunner, and the Navy already has an autonomous submarine-destroying ship. One may guess that the Army's contribution would be the heavily-armed Terminator.
终结者机器人领袖确实可以组织一支取之不尽、用之不竭的机器人士兵,这些士兵无惧死亡或受伤,发动全面战争(或镇压可预见的人类叛乱)。不同国家的机器人军队(由人类或机器人自己指挥)也可能在机器人战争中相互争斗,争夺世界霸权。当然,只有机器人,而不是人类,才能征服其他星系的遥远世界,或者不那么恶意地传播受人类启发的先进机器人文明的口号,而不是征服。
Terminator robot leaders indeed may organize an inexhaustible supply of soldier robots who have no fear of death or injury to wage all-out war (or to suppress a predictable humankind rebellion). Different countries’ robot armies, either directed by humans or the robots themselves) also may well fight each other in robot wars for world hegemony. And of course, only robots, and not humans, can travel to conquer distant worlds in other galaxies, or less malevolently, instead of conquering, spread the mantra of a human-inspired advanced robot civilization.
由于人工智能系统已被证明在空对空战斗中表现优于经验丰富的军事飞行员,“人工智能可能会立即穿透[但]加深战争迷雾”并且“智能武装机器人是随时可能发生的战争罪行”。《原子科学家公报》警告说,自主武器可能违反日内瓦公约的武装冲突人道主义法,更不用说阿西莫夫的机器人三定律了。1
Because an AI system has been shown to outperform an experienced military pilot in air-to-air combat, “AI might at once penetrate [but] thicken the fog of war” and “an intelligent armed robot is a war crime waiting to happen”. The Bulletin of Atomic Scientists has warned that autonomous weapons may violate the Geneva Convention's Humanitarian Law of Armed Conflicts, to say nothing of Asimov's Three Laws of Robotics.1
尽管如此,世界大国仍在快速推进人工智能战争研究,例如,美国国防高级研究计划局的实时高级情报与决策(RAID)软件基于博弈论的子博弈,应用于自主战争,中国军事科学院正在开发类似 AlphaGo 的自我监督战争战略和战术,俄罗斯斯科尔科沃科学技术研究所也与美国麻省理工学院合作开发人工智能,并进一步购买中国的面部识别技术用于国内安全。
Nonetheless, world powers are proceeding apace with AI-warfare research; for example, America's DARPA's Real-Time Advanced Intelligence and Decision-Making (RAID) software is based on the subgames of game theory as applied to autonomous warfare, China's Academy of Military Science is developing AlphaGo-like self-supervised war strategy and tactics, and Russia's Skolkovo Institute of Science and Technology has incongruously partnered with America's MIT in AI development, and further purchased China's facial recognition technology for domestic security.
俄罗斯总统普京曾表示,“谁在人工智能发展上领先,谁就成为世界的主宰”,这预示着一场全面的人工智能控制论和机器人军备竞赛才刚刚开始。
The Russian president Vladimir Putin's has said that “the nation that leads in the development of artificial intelligence will become the ruler of the world”, portending an all-out AI cybernetics and robot arms race that has only just begun.
如果战争是在虚拟战场上进行的,就像在线视频游戏一样,那么只要对手能够同意接受模拟结果,战争就可以在不造成物资损失和人员牺牲的情况下进行。然而,人类不太可能仅仅根据模拟结果就接受失败,但逻辑机器人可能会根据后果数据达成一致。
If the wars are fought on synthetic battlefields, as in online video games, then if the adversaries can agree on accepting the outcome of the simulation, war can be waged without matériel destruction and human sacrifice. It is unlikely, however that humans would accept defeat based solely on a simulation result, but the logical robots might be able to agree based on the data of aftermath.
很难想象机器人领导者会比那些在过去明显表现出极度愚蠢的人类领导者更糟糕。事实上,如果机器人有自我保护的程序,有逻辑的成本/收益分析,并且从更大、更好的知识数据库中获得更多信息,在没有人类偏执和傲慢的情况下,它们很可能在任何情况下都会决定不发动战争,甚至可能自我毁灭。在这方面,机器人的表现应该不会比人类更差,甚至可能比人类好得多,希望这对人类和机器人社会都有好处。
It is difficult to imagine that robots leaders would be any worse than those human leaders who have clearly demonstrated the utmost stupidity in the past. Indeed if robots have programmed self-preservation, logical cost/benefit analysis, and have more information from larger and better knowledge databases, in the absence of human paranoia and arrogance, they would likely decide against war and probable self-destruction in any instance. This is one area where robots should fare no worse, and likely much better, than humans, hopefully for the betterment of human and robot society.
第一位物理地质学家查尔斯·莱尔(Charles Lyell)曾批评他那个时代的地质学“数据泛滥,思想吝啬”;今天的人工智能可能也存在过度依赖数据和缺乏理论基础的风险。2
The first physical geologist Charles Lyell, had criticized the geology of his day as, “prodigal in data and parsimonious of thought”; today's artificial intelligence is perhaps similar in the risk of overreliance on data and lack of theoretical foundation.2
Facebook 首席人工智能科学家 Yann LeCun 指出,在对一个六个月大的孩子进行的实验中,向她展示一辆卡车驶下悬崖并在空中悬停的图像,孩子并没有感到惊讶,但仅仅两个月后再向她展示同样的图像,她立刻意识到出了问题;也就是说,她从这段时间的观察中已经发现了引力定律,而由于婴儿的运动能力有限,她一定已经学会了引力,并将其概括化她通过观察周围的世界,很快学会了如何理解各种不同的事物。3
Facebook's chief AI Scientist, Yann LeCun has noted that in experiments with a six-month old child who is shown an image of a truck driving off a cliff and hovering in the air elicited no surprise from the child, but shown the same image only two months later, she instantly knew that something was wrong; that is, she has from observations in the interim already discovered the law of gravity, and since infants have limited motor ability, she must have learned gravity and generalized it very quickly by observation of the world around her, all the while learning and generalizing many other disparate things.3
婴儿的这种“神经符号”能力远远超过由大量数据训练的深度监督、学习强化和自我监督的人工神经网络。即使是像 AlphaGoZero 这样的推理引擎,虽然在不同的棋盘游戏中表现出色,但无法同时学习有关其环境的其他事物。这意味着人类(也许还有一些动物,比如好奇的猫)拥有某种学习的欲望
This “neural-symbolic” ability of an infant far surpasses that of a very deep supervise-trained by voluminous data, learning-reinforced, and self-supervised artificial neural network. Even an inference engine such as AlphaGoZero, although supreme in different board games, cannot learn other things about its environment at the same time. This implies some sort of desire to learn that humans (and perhaps some animals like curious cats) possess
目前,机器仅通过自我监督学习就能对围棋世界产生超越性的理解能力。那么,机器能否像人类小孩一样,通过概念假设来理解诸如引力之类的东西,比如牛顿万有引力定律及其数学描述?
So far, machines, through self-supervised learning alone, can develop a surpassing ability to understand the world of Go. Can a machine, like a human child, understand such things as gravity by conceptual postulation, something like Newton's law of gravitation and its mathematical description?
人工智能“AlphaGoZero”中的“ Zero ”意味着没有经过任何监督训练,因此一个全新的领域原则上可以仅通过人工智能推理来实现机器学习。
The Zero in the artificially intelligent “AlphaGoZero” means that there has been no supervised training, so a completely new field of endeavor may in principle be machine learned by an artificial intelligence inference alone.
但学习依赖于观察和改进,这在概念上并不是什么新鲜事。机器人能否推导出爱因斯坦引力场方程,该方程解释了超越牛顿力学的宇宙的运作,广义相对论基于一个尖锐的但绝不是常见的可观测量,或者对可以用黎曼曲率张量的数学逻辑描述的流形的思考?4
But that learning depends on observation and refinement, it was nothing new conceptually. Will robots be able to derive the Einstein's gravitational field equation that explains the workings of the Universe beyond Newtonian mechanics, the General Theory of Relativity being based on an acute, but by no means a common observable, or a contemplation of a manifold that can be described by the mathematical logic of a Riemann curvature tensor?4
这又回到了最初的问题:观察和计算所需的智能与操纵抽象数学形式以达到观察经验范围之外的逻辑目的的愿望和智能之间是否存在差异,例如爱因斯坦狭义相对论的微小机制,当物体的速度接近光速时,长度会缩短,时间会延长,引力的黎曼曲线流形,所有这些都不容易观察到。那些无法观察到但可以通过概念关系了解的东西可能标志着人类和人工智能的本质区别。还有其他区别。
This then is a return to the original question: The difference, if any, between the intelligence required for observation and computation, and the desire and intelligence to manipulate the forms of abstract mathematics to a logical end beyond the realm of observational experience, such as the minute mechanisms of Einstein's Special Relativity that contracts length and stretches time as the speed of a body approaches the speed of light, and the Riemann curved manifold of gravitation, all of which are not readily observable. That which is not observable but known through conceptual relationships likely marks an essential separation of human and artificial intelligence. There are other differences as well.
电视伟大的数学家大卫·希尔伯特在 1900 年巴黎举行的第二届国际数学家大会上就“数学问题”发表演讲时说: 1
The great mathematician David Hilbert, in his remarks from a talk on “Mathematical Problems” given at the Second International Congress of Mathematicians at Paris in 1900 said,1
让我们来谈谈这门科学的问题来源。毫无疑问,数学每个分支中最早和最古老的问题都来自经验,并由外部现象世界所暗示。甚至整数计算规则也一定是在人类文明的较低阶段以这种方式发现的,就像今天的孩子通过经验方法学习应用这些定律一样。几何学的最初问题、立方体、化圆为方也是如此;数值方程解理论、曲线理论和微积分、变分法、傅里叶级数理论和势理论中最古老的问题也是如此——更不用说其他大量真正属于力学、天文学和物理学的问题了。
但是,在数学分支的进一步发展中,人类思维在成功解决问题的鼓舞下,意识到了其独立性。通过逻辑组合、概括、专业化,通过以幸运的方式分离和收集想法——通常没有来自外界的明显影响——它从自身发展出新的富有成果的问题,然后自己成为真正的提问者。
Let us turn to the question of the sources from which this science derives its problems. Surely the first and oldest problems in every branch of mathematics stem from experience and are suggested by the world of external phenomena. Even the rules of calculation with integers must have been discovered in this fashion in a lower stage of human civilization, just as the child of today learns the application of these laws by empirical methods. The same is true of the first problems of geometry, the cube, the squaring of the circle; also the oldest problems in the theory of the solution of numerical equations, in the theory of curves and the differential and integral calculus, in the calculus of variations, the theory of Fourier series, and the theory of potential – to say nothing of the further abundance of problems properly belonging to mechanics, astronomy and physics.
But, in the further development of a branch of mathematics, the human mind, encouraged by the success of its solutions, becomes conscious of its independence. By means of logical combination, generalization, specialization, by separating and collecting ideas in fortunate ways – often without appreciable influence from without – it evolves from itself alone new and fruitful problems, and appears then itself as the real questioner.
仔细阅读希尔伯特的文字,就会发现人工智能机器学习的发展轨迹,直到人类思维变得“独立”和“无它本身并不会产生明显的影响……它只会从自身出发,提出新的富有成果的问题,然后作为真正的提问者出现”。
A careful reading of Hilbert's words will reveal a precise tracking of the development of artificial intelligence machine learning up until the paragraph where the human mind becomes “independent” and “without appreciable influence without … it evolves from itself alone new and fruitful problems, and appears then itself as the real questioner”.
人类终极智慧的典范阿尔伯特·爱因斯坦将科学定义为:2
The paragon of ultimate human intelligence, Albert Einstein, defined science as,2
科学就是试图将我们感官体验的混乱多样性与一个逻辑上统一的思想体系相一致。在这个系统中,单一的经验必须与理论结构相关联,以使由此产生的协调是独一无二的、令人信服的。
Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought. In this system single experiences must be correlated with the theoretic structure in such a way that the resulting coordination is unique and convincing.
因此,科学思维必须符合经验,人工智能机器确实已经完成了科学研究,例如开普勒第三定律和门捷列夫元素周期表的推导,将其作为一个逻辑上统一的思维体系。爱因斯坦对物理学的看法是
Scientific thought thus must conform with experience and an AI machine has indeed done science such as the deduction of Kepler's Third Law and Mendeleyev's Periodic Table of the Elements as a logically uniform system of thought. Einstein's view of physics was that
物理学……涉及数学概念;然而,这些概念只有通过明确确定它们与经验对象的关系才能获得物理内容。
Physics … deals with mathematical concepts; however, these concepts attain physical content only by the clear determination of their relation to the objects of experience.
物理学利用数学,但物理学的概念和理论必须与相关经验相符。举一个观察性的例子,他的光电效应理论可以直接观察到。举一个概念性的例子,他的狭义相对论洛伦兹收缩虽然从未被物理测量过,但必须由于数学理论而发生;然而,时间膨胀已经通过校准卫星全球定位系统 (GPS) 的准确性而被发现。
Physics utilizes mathematics but the concepts and theories of physics must comport with the relevant experience. For an observational example, his theory of the photoelectric effect can be directly observed. For a conceptual example, his Special Theory of Relativity Lorentz contraction, although never having been physically measured, must occur because of the mathematical theory; time dilation, however, has been revealed by the necessity of calibrating satellite global positioning systems (GPS) for accuracy.
爱因斯坦对数学的定义是,
Einstein's definition of mathematics is,
数学只研究概念之间的关系,而不考虑它们与经验的关系。
Mathematics deals exclusively with the relations of concepts to each other without consideration of their relation to experience.
因此,数学不同于那些依赖观察、实验和经验来得出理论的科学。例如,爱因斯坦的广义相对论引力场方程是通过关于引力的“思想实验”构思出来的,然后用黎曼曲率张量流形进行数学描述,该流形经受住了观察的检验,但爱因斯坦没有先进行物理观察,然后制定理论来解释观察结果,而是从纯数学概念关系开始构建理论,后来被观察为真实。这种纯概念思维产生的理论可能是人工智能与人类智能相比所欠缺的。
Mathematics is thus different from the sciences that depend on observation, experimentation, and experience leading to a theory. For example, Einstein's General Theory of Relativity Gravitational Field Equation was conceived by “thought experiments” about gravity, and then mathematically described by a Riemann curvature tensor manifold which has stood the test of observation, but instead of first making a physical observation and then formulating a theory to explain the observation, Einstein began with the relation of concepts in a purely mathematical way to construct a theory that was later observed to be true. This purely conceptual thinking resulting in a theory is where artificial intelligence may be wanting in comparison with human intelligence.
当让·傅立叶主张“数学的目的在于解释自然现象”时,卡尔·雅可比反驳道:“像傅立叶这样的哲学家应该知道
When Jean Fourier maintained that “the purpose of mathematics lies in the explanation of natural phenomena”, Carl Jacobi objected, “a philosopher like Fourier should know that
人类精神的荣耀
是一切科学的唯一目标!
the glory of the human spirit
is the sole aim of all science!
因此,两位杰出的数学物理学家之一坚持解释观察到的现象,另一位则误入了模糊的“人类精神”。但机器人的意志永远不会体现“人类精神的荣耀”,尽管也许随着时间的推移,一种与机器人智力相称的“机器人精神”可能会出现。
Thus one of the two eminent mathematical physicists stayed with explanations of observed phenomena, the other strayed into a vague “human spirit”. But robots of their own volition will never perform for the “glory of the human spirit”, although perhaps there may be in time a “robot spirit” developed that is worthy of robotic intellectualism.
19世纪哲学家奥古斯特·孔德曾试图举一个无法解决的问题的例子,他说科学永远无法查明宇宙物体化学成分的秘密。几年后,这个“秘密”被门捷列夫揭晓,并为所有严肃的物理和化学科学家所知。
The 19th Century philosopher Auguste Comte, in an effort to give an example of an unsolvable problem, once said that science would never succeed in ascertaining the secret of the chemical composition of the bodies of the Universe. A few years later that “secret” was revealed by Mendeleyev and known by all serious physical and chemical scientists.
大卫·希尔伯特曾说:“孔德之所以找不到无法解决的问题,真正的原因在于,根本就没有无法解决的问题。”这种自负体现了人类的“意志”,即人类不可抗拒的理解欲望,正如希尔伯特后来刻在墓碑上的文字所体现的那样,3
David Hilbert said, “The true reason why Comte could not find an unsolvable problem lies in the fact that there is no such thing as an unsolvable problem”. This conceit epitomizes the human “will”, the ineluctable human desire to understand, as exemplified by Hilbert's words, later inscribed on his gravestone,3
我们必须知道。
我们的监狱长知道。
Wir müssen wissen.
Wir warden wissen.
所以也许正是人类的精神和意志将人类与机器区分开来;也就是说,机器人不像那个永远对周围环境充满好奇的小女孩,机器人只会做它想做的事情。被编程来执行,或者像零型机器那样,不是靠自己的精神或意志从头开始执行任务,而是按照人类的指示。
So perhaps it is the human spirit and will that separate humans from machines; that is, a robot is not like the little girl who is forever curious to learn everything about her environment, the robot only does what it is programmed to do, or as the zero machines do, perform starting from scratch on a task not by its own spirit or will, but rather as directed by a human.
公元前三世纪,在中国的战国时期,孔子的弟子荀子将天下万物分为:
In the Third Century BCE, in the China of the Warring States, a disciple of Confucius named Xunzi classified all things Under Heaven:4
水火有灵而无生;草木有生而无知觉;飞禽走兽有知觉而无德行;人有灵、有生、有知觉、有德行、有德行。
Water and fire have spirit but not life; plants and trees have life but not perception, birds and animals have perception but not virtue or justice, man has spirit, life, perception, virtue and [a sense of] justice.
因此,荀子增加了感知、美德和正义感,使人类有别于其他一切。爱因斯坦无疑展示了概念感知,并以多种方式展示了他的美德,其中包括在埃米·诺特逝世后写给纽约时报的一封信。埃米·诺特是抽象代数的创始人,她在绝技中整合了对称性、协变性和物理学守恒定律,提出了诺特定理。爱因斯坦描述了诺特一生如何遭受同龄人严重的贬低,仅仅因为她是女性,就无偿工作或赚取微薄的报酬,他这样评价她:
Xunzi thus added perception, virtue, and a sense of justice to set humans apart from everything else. Einstein certainly demonstrated conceptual perception, and has shown his virtue in many ways, among them in a letter to the New York Times upon the death of Emmy Noether, a founder of abstract algebra who integrated symmetry, covariance, and the conservation laws of physics in the tour de force Noether's Theorem. Einstein described how Noether throughout her life suffered severe peer denigration and worked unpaid or for a pittance solely because of the fact that she was a woman, he wrote of her:
在为积累世俗财富而付出的努力背后,常常存在着一种幻想,认为这是最实质性、最令人向往的目标;但幸运的是,有一小部分人从小就认识到,人类最美丽、最令人满意的经历不是来自外界,而是与个人自己的感觉、思想和行为息息相关……
Beneath the effort directed toward the accumulation of worldly goods lies all too frequently the illusion that this is the most substantial and desirable end to be achieved; but there is, fortunately, a minority composed of those who recognize early in their lives that the most beautiful and satisfying experiences open to human kind are not derived from the outside but are bound up with the individual's own feeling, thinking and acting ….
正如他在晚年反思中所写,5
And as he wrote in his ruminations of later years,5
生命是一场冒险,永远与死亡抗争。人类文明经过数千年的进步,形成了美德、抱负和实践真理的标准,共同构成了所有文明社会共同拥有的不可侵犯的遗产。人类怀着强烈的意志去寻求正义和真理。
Life is an adventure, forever wrested from Death. Human civilization through millennia of progress has formed standards of virtue, aspiration, and practical truth, altogether forming an inviolable heritage that is common to all civilized society. Man endures a passionate will to search for justice and truth.
机器能否拥有冒险精神和意志,并深信美德和正义,从而寻求真理?机器人会害怕死亡(“失灵”)吗?机器人能否展现和引发同情心?机器人是否会自主合作,发展出有道德的机器人文明?
Can a machine possess an adventurous spirit and will imbued within a deep philosophical belief in virtue and justice in a search for truth? Is a robot afraid of death (“out of order”)? Can a robot exhibit and elicit compassion? Will robots ever autonomously cooperate to develop a virtuous robot civilization?
像爱因斯坦提出相对论那样在观察之前运用概念思维的能力,以及人类与生俱来的好奇心、精神、意志、欲望和美德,以及在寻求真理的过程中产生的正义感,是我们与人工智能机器人的区别所在。
The ability to use conceptual thought before an observation as Einstein did with his theories of relativity, and innate human curiosity, spirit, will, desire, and virtue, and a sense of justice in the search for truth are what separates us from the artificially intelligent robot.
我在第 26 章的支持向量机中,问题是找到一个函数,使支持向量边际的范围最大化,同时限制这些支持向量必须最接近超平面。这只是在约束条件下寻找极值(最大值或最小值)函数的问题;它不同于最小化成本函数时使用的更简单的计算,因为函数本身是一条曲线,而不仅仅是曲线上的一个点。
In the Support Vector Machine of Chapter 26, the problem was to find the function that maximizes the extent of support vector margins with the constraint that those support vectors must be those closest to the hyperplane. This is just the problem of finding an extremal (either maximum or minimum) function under constraints; it is different from the simpler calculation used in minimizing the Cost Function because the function is itself a curve and not just a point on a curve.
从变分法推导出欧拉-拉格朗日方程很好地展示了数学是如何进行的(也许可以测试机器是否能做这样的数学运算)。
The derivation of the Euler-Lagrange equation from the Calculus of Variations is a good demonstration of how mathematics is done (and perhaps a test of whether a machine can do mathematics like this).
极值问题也有一段有趣且富有启发性的历史,例如,在封建时代的欧洲,求最大值是必要且有用的,当时父亲根据每个儿子在一天内用等长的绳子能划出多少土地将其割让给儿子。明智的父亲通过这样的智商测试,可以确保最聪明的儿子继承最多的土地。
Extremal problems also have an interesting and illuminating history, for instance, finding the maximum was necessary and useful in feudal Europe where land was ceded from father to sons according to how much land each son could mark off in one day given ropes of equal length. The wise father, through such an IQ test, could ensure that the smartest boy would inherit the most land.
然而,最先确定绳子位置的并不是那些男孩,而是一个女孩。这个故事可以追溯到三千年前的腓尼基人和他们的公主狄多。为了逃避暴虐的哥哥,公主在今天的北非西岸突尼斯寻求庇护。那里的国王允许她避难,但轻蔑地把“一张牛皮所能容纳的所有土地”遗赠给她作为她的领地。
However, it was not those boys who first determined the locus of that rope, rather it was a girl. The story goes back three thousand years to the Phoenicians and their Princess Dido. Fleeing her tyrannical brother, the Princess sought refuge in what is today Tunisia on the western shore of North Africa. The king there granted her asylum but dismissively bequeathed her “all the land that could be contained in a bull's skin” as her dominion.
于是,这位善于分析的公主开始把公牛的皮切成薄片,把它们绑在一起,做成一根很长的绳子,把绳子的一端固定在岸边的一根柱子上,把绳子拉成半圆的半径相当大,形成的面积远远大于这位愤世嫉俗的国王所想的。传说这个半圆后来成为神话中的迦太基城的中心,公主以狄多女王的身份统治着这座城市。1
Whereupon the analytical princess proceeded to cut the bull's skin into thin strips, tying them together to form a very long cord, and securing one end on a post on the shoreline, played out the line in a semicircle of considerable radius to form an area far greater than what the cynical king had in mind. Legend has it that this semicircle grew to become the center of the mythical city of Carthage, over which the Princess reigned as Queen Dido.1
这个关于最大化任意端点曲线所包围面积的引人入胜的故事被人们亲切地称为狄多问题,技术上称为等周问题。每个人都知道狄多问题的答案;最大的面积当然是用一个圆来划定的,这个答案似乎非常明显。但数学家是奇怪的鸭子;他们对答案不太感兴趣,他们更感兴趣的是证明一个圆确实包围了最大的面积,而数学证明并不那么明显。
This appealing story of the maximization of an area encompassed by a curve with arbitrary endpoints became known affectionately as Dido's Problem and technically as the isoperimetric problem. Everyone knows the answer to Dido's Problem; the largest area is of course delineated by a circle, an answer that seems eminently obvious. But mathematicians are strange ducks; they are interested less in the answer and more in the proof that a circle does indeed encompass the greatest area, and the mathematical proof is not so obvious.
有许多证明尝试,其中包括阿基米德在公元前 250 年左右在圆内刻画一个多边形(顶点切入圆)。随着多边形的边数n增加以形成n 边形,随着n 的增加,面积将会增加,当边数趋近于无穷大时,n 边形将趋近于圆,圆将是无穷边形中的最大面积,因为随着边数的增加,面积会不断增长。这种通过从内部增加多边形数量来趋近圆的过程,阿基米德恰当地称为穷举法。当然,这个想法是,任何多边多边形的面积都会小于它们趋近的圆,从而“证明”圆具有最大面积。
There were many attempts at proof, among them Archimedes’ inscription of a polygon inside a circle (with vertices touching the circle) performed around 250 BCE. As the number of sides n of the polygon are increased to form an n-gon, as n increases, the area will increase, and as the number of sides approaches infinity, the n-gon will approach a circle, which will be the maximum area infinigon because it continues to grow as the number of sides increase. This approach to the circle by ever-increasing numbers of polygons from within is a process Archimedes aptly called exhaustion. The idea of course is that any multiply-sided polygon will have less area than the circle which they approach, thus “proving” that a circle has the maximum area.
这项繁琐的工作还有一个好处,就是可以确定无穷边形的周长与其直径的比值。阿基米德之后 500 年,中国数学家刘徽发明了一种迭代算法,用于构造一个 12,288 边形,非常接近圆形,并给出π的 3.141592920 值,这是最佳近似值,直到 150 年后,数学家兼天文学家祖冲之 (429-500 CE) 使用 24,576 边形得到 3.1415926-3.1415927,这是此后 800 年内最接近的 π 近似值。2
A side benefit of this tiring exercise is the determination of the ratio of the circumference of the infinigon to its diameter. Five hundred years after Archimedes, the Chinese mathematician Liu Hui devised an iterative algorithm that was used to construct a 12,288-sided polygon, indeed closely approaching a circle, and giving a value of 3.141592920 for π that stood as the best approximation until 150 years later when the mathematician-astronomer Zu Chongzhi (429–500 CE) employed a 24,576-sided polygon to obtain 3.1415926–3.1415927, the closest approximation of π for the next 800 years.2
当然,我们都知道圆的面积等于πr2,这可以通过对同心圆连续积分到圆的边缘来轻松证明(就像将圆内的所有无穷大项相加一样)。由于每个环的周长为2πr′(其中r′为虚拟变量),从圆心到圆周边缘积分可得出面积
Of course we all know that the area of a circle is given by πr2, which can be easily shown by integrating concentric circles continuously up to the rim of the circle (like adding up all the infinigons within the circle). Since the circumference of each ring is 2πr′ (where r′ is a dummy variable), integrating from the center to the rim of the circle gives an area
但π是无理数,这意味着它不能用两个整数的比值来表示,因此小数位数是无穷的,没有循环的数字序列;更糟糕的是,π是超越数,这意味着它不能作为具有整数系数的多项式方程的根来求解。因此,它只能通过无穷级数、三角级数和各种迭代技术(如刘氏算法)来近似。因此,出现了一种相当奇怪的情况,即“知道” π,但不知道它的值,只能越来越接近存在的东西(在数字刻度线上),但永远找不到,即使挖掘得越来越深、越来越精细。
But π is irrational, meaning that it cannot be expressed as the ratio of two integers, and so has a never-ending number of decimal places with no recurring series of digits; even worse, π is transcendental, meaning that it cannot be solved for as a root of a polynomial equation with integer coefficients. Therefore it can only be approximated through infinite series, trigonometric series, and various iteration techniques, such as Liu's algorithm. Thus the rather strange situation of “knowing” π but not its value, being able only to ever more closely approach something that is there (on the number scale line) but can never be found, even upon ever deeper and finer burrowing.
尽管永远难以捉摸,π在计算中确实有用(例如圆的面积),揭示了生活中的两个重要方面:关系很重要,亲密就足够了。此外,在生活中和数学中,我们可能知道一件事,但我们真的知道它的价值吗?
Albeit eternally elusive, π certainly works in computations (for instance the area of a circle), revealing two important aspects of life: It is the relationships that matter and closeness is good enough. Furthermore, in life as well as mathematics, we may know a thing, yet do we ever really know its value?
对某些人来说,追求这一价值是一种永不放弃的激情。目前的记录是 2011 年创下的,是3后面的 10 13 位数字,由近藤茂在东京的家中使用快速收敛的广义超几何级数3计算得出
The pursuit of that value is an unrelenting passion for some. The current record, set in 2011, is 1013 digits after the 3, computed by Kondo Shigeru at his home in Tokyo employing the rapidly converging generalized hypergeometric series3
在Alexander Yee自制的 48TB 硬盘处理器上运行Chudnovsky 算法的y-cruncher程序一年,产生了大量的热量,以至于他(长期忍受)的妻子 Yukiko 会把洗衣机里的衣服直接拿到他的书房,并指出,“我们可以把衣服烘干得很好,但我们每月必须支付 30,000 日元的电费”。备用电源进一步提高了电费,因为当近藤十几岁的女儿打开吹风机时,一些先前的计算就落空了。4
Running the above using the Chudnovsky algorithm on Alexander Yee's y-cruncher program on his home-made 48-terabyte hard-drive processor for a year produced so much heat that his (long-suffering) wife Yukiko would bring clothes from the washer directly into his study, noting that, “we could dry the laundry very well, but we had to pay ¥30,000 a month for electricity”. Raising the bill even further was a back-up power supply, as some previous computations had come to grief when Kondo's teenage daughter turned on her hair dryer.4
当然,更具分析性的方法是从数学上找到最大曲线函数。虽然伟大的数学物理学家笛卡尔、费马、伽利略、牛顿、莱布尼茨、惠更斯以及雅各布和约翰·伯努利兄弟都为数学极值在物理学中的早期应用做出了贡献,但正是等周、最短光程和光线传播问题促使了变分微积分的发展。
A more analytical approach was of course to mathematically find the maximum curve function. Although the great mathematical physicists Descartes, Fermat, Galileo, Newton, Leibniz, Huygens, and the Jakob and Johann Bernoulli brothers all contributed to the early use of mathematical extrema in physics, it was the isoperimetric, brachistochrome, and light ray propagation problems that prompted the development of the variational calculus.
最短路径问题是寻找物体在重力作用下从固定线的一个点落到另一个较低高度的固定点的最短路径。大多数人会说两点之间的最短距离是一条直线,或者如牛顿所猜测的那样,是一段圆弧。但令人惊讶的是,答案是一条摆线,即平面上沿直线滚动的圆上点的轨迹。这在现代过山车的建造中得到了合理的应用。
The brachistochrome problem is to find the path for a body to fall the fastest under gravity from one point of a fixed wire to another point fixed at a lower height. Most would say the shortest distance between two points, a straight line, or as Newton guessed, an arc of a circle. But the answer is amazingly a cycloid, the locus of a point on a circle rolling in a straight line on a flat plane. This reasonably enough is used in the construction of modern roller coasters.
因此,等周最大面积和最快下降曲线,加上费马与光线路径最短时间原理的类比,共同形成了导致变分法的极值。5
So the isoperimetric largest area and the fastest curve of descent, plus Fermat's analogy with the path of a light ray's principle of least time together formed the extremals that led to the variational calculus.5
然而,直到 1744 年,莱昂哈德·欧拉才与约瑟夫·拉格朗日共同编纂了变分法,该法后来被称为欧拉-拉格朗日方程,用于寻找极值函数。
However, it was not until 1744 that Leonhard Euler, together with Joseph Lagrange, codified the calculus of variations that has become known as the Euler-Lagrange equation to find extremal functions.
通常,线积分给出具有极值y(x)的函数的值,其中f(x, y)是描述物理系统的连续函数。独立变量x的函数、描述感兴趣函数的因变量y以及y相对于x的导数( dy/dx ), y随x的变化以一般形式写为
Generally, a line integral gives the value of a function that possesses an extremal value y(x) where f(x, y) is a continuous function describing the physical system. A function of the independent variable x, the dependent variable y that delineates the function of interest, and the derivative of y with respect to x (dy/dx), the change of y with x is written in general form as
以及价值从任意点x A到x B沿一条线的距离由沿该线的积分给出,
and the value of from the arbitrary points xA to xB along a line is given by the integral along that line,
为了找到极值函数y(x),一个测试函数 定义为
To find the extremal function y(x), a test function is defined as
其中,真实极值函数由y(x)给出,而εμ(x)只是测试函数和实际极值函数之间的差。
where the real extremal function is given by y(x), and εμ(x) is just the difference between the test function and the actual extremal function.
差分项εμ(x) 的写法是这样的:ε可以充当趋于零的变量,以最小化线积分,正如我们将看到的,而μ(x)是独立变量x的函数,可以用作测试函数导数的替代,并用于设置所寻求的极值函数的端点。对于数学家来说,因为y(x)被假定为解析的(可用常见的初等数学函数描述),而我们毕竟想要用解析的方式描述y(x),它是连续且可微的,因此同样地,μ(x)也必须是连续且可微的。
That difference term, εμ(x), is written so that the ε can serve as a variable that goes to zero to minimize the line integral, as will be seen, and the μ(x) is a function of the independent variable x that can be used as a surrogate for the derivative of the test function and serves to set the endpoints of the sought-after extremal function. For the mathematicians, because y(x) is assumed to be analytic (describable by common elementary mathematical functions), and we do after all want to describe y(x) analytically, it is continuous and differentiable, and so by the same token μ(x) also must be continuous and differentiable.
变分法的第一个非常简单的技巧是定义当ε趋近于零时,它变成人们所追求的y(x),函数μ(x)满足两个边界条件,即在积分极限处,μ(x)必须为零,因此μ(x A ) = 0 和μ(x B ) = 0。也就是说,当ε → 0 时,趋近于y(x),此时测试函数就成为所求极值函数。以测试函数为因变量的函数的线积分就是所求的极值函数,
The first very simple trick of the calculus of variations is in the definition of as becoming the sought-after y(x) when ε approaches zero, with the function μ(x) satisfying two boundary conditions, namely that at the integration limits, μ(x) must be zero, so μ(xA) = 0 and μ(xB) = 0. That is, when ε → 0, then approaches y(x), and the test function then becomes the desired extremal function. The line integral of the function with the test function as dependent variable is then just the extremal function in question,
现在的目标是对上述线积分进行极值化,记住是参数ε的函数,此时,计算单变量函数的极值可以利用微分学的初等极值技巧。函数的极值在ε → 0 时出现,因此最大化的必要条件当ε → 0时,将关于参数ε 的导数设为零,
Now the objective is to extremize the above line integral , keeping in mind that is a function of the parameter ε; and in this case, calculation of the extremal function with respect to a single variable can utilize the elementary extrema technique of the differential calculus. Extrema of a function are found when ε → 0, so the necessary condition for the maximization of is found by setting the derivative with respect to the parameter ε equal to zero as ε → 0,
代入线积分从上面进入导数,然后微分得到
Substituting the line integral of from above into the derivative and then differentiating gives
为了方便起见,习惯上写成,其中撇号表示关于x 的导数,因此上述方程可以更简洁地写成
For convenience, it is customary to write , where the prime denotes the derivative with respect to x, so the above equation may be written more compactly as
当积分极限(x A和x B)不是微分变量ε的函数(正如这里的情况)时,积分的导数就是导数的积分(莱布尼茨规则),因此
When the limits of integration (xA and xB) are not functions of the variable of differentiation ε, as is the case here, then the derivative of an integral is just the integral of a derivative (Leibniz’ Rule), so
根据链式微分法则,
According to the chain rule of differentiation,
然后
then
因为,,和, 然后
Because ,,and , then
当ε → 0 时,和, 所以
When ε → 0, and , so
变分法的第二个技巧是利用分部积分法则,
Now the second trick of the calculus of variations is to use the rule for integration by parts,
对于上述最大化方程右边的第二项,设
and for the second term on the right-hand side of the above maximization equation, set
因为
and because
然后
then
所以
so
因此,在分部积分中,μ(x)的导数被省去,边界条件μ(x A ) = 0和μ(x B ) = 0处理等式右边的第一项,
Thus, in integrating by parts, the derivative of μ(x) is dispensed with and the boundary conditions μ(xA) = 0 and μ(xB) = 0 take care of the first term on the right-hand side of the equation,
现在回到
Now returning to
为了完全省去μ(x),因为
in order to dispense with the μ(x) altogether, since
调用变分基本引理;在(x A , x B ) 中,由于μ(x)是任意选择的函数,因此方括号中的表达式必定为零。这就是欧拉-拉格朗日方程,6
the fundamental lemma of the variational calculus is invoked; within (xA, xB), because μ(x) is an arbitrarily chosen function, then it is the expression in the square brackets that must vanish. This then is the Euler-Lagrange equation,6
如果一般函数取系统的动能 ( K ) 与势能 ( U ) 之差,即拉格朗日量L = K – U,则物理学和化学中使用的欧拉-拉格朗日方程为
If the general function is taken as the difference between the kinetic energy (K) and the potential energy (U) of a system, the Lagrangian, L = K – U, then the Euler-Lagrange equation as used in physics and chemistry is
该方程由变分法推导而来,用于在支持向量机中寻找支持向量到边缘的最小距离。
This equation derived from the calculus of variations is used to find the minimum distance of the support vector to the margins in the support vector machine.
拉格朗日量L ( x, λ ) 包含一个拉格朗日乘数 λ ,它将约束问题转换为无约束问题,以便找到极值点(导数 = 0 )。这是使用以下技巧完成的,
The Lagrangian L(x, λ) includes a Lagrange Multiplier λ that transforms the constrained problem to an unconstrained problem so that the extremal points (derivative = 0) can be found. This is done using the artifice,
其中g(x)是等式约束,显然当λ = 0时,拉格朗日量将是所需函数f(x)。
where g(x) is the equality constraint and clearly when λ = 0, the Lagrangian will be the desired function f(x).
拉格朗日乘数λ就是拉格朗日量作为约束参数c函数被极值的比率(在 SVM 的情况下,约束条件是支持向量必须是最接近超平面的向量),
The Lagrange Multiplier λ is just the rate that the Lagrangian is being extremalized as a function of the constraint parameter c (in the SVM case the constraint that the support vectors must be the vectors closest to the hyperplane),
在对L(x, λ)、min(x)max(λ)L(x, λ)进行 minimax 运算时,反之亦然,简单地说,由于极值的支撑向量平行或垂直于超平面(通过内积来衡量),最小值和最大值是“互相争斗”的,因此λ → 0,并且支撑向量与超平面的最小距离可以通过欧拉-拉格朗日方程来计算。
When performing minimax on L(x, λ), min(x)max(λ)L(x, λ) and vice-versa, simply put, since the support vectors at the extremes are parallel or perpendicular to the hyperplane (as measured by the inner product), the minimum and maximum values are “fighting each other”, and so λ → 0 and the minimum distance of the support vectors from the hyperplane may be calculated from the Euler-Lagrange equation.
欧拉-拉格朗日方程只是瑞士数学家莱昂哈德·欧拉的众多成就之一,用伟大的拉普拉斯的话来说,他“是我们所有人的导师”。事实上,尽管欧拉接受的是神学家和医生的教育,并最初于 1727 年被任命为俄罗斯帝国科学院医学部的助理,但他在为彼得大帝服务期间,很快就在数学方面取得了开创性的发现,彼得大帝希望俄罗斯赶上西欧科学。他与约翰的儿子丹尼尔·伯努利一起工作,并一直留在圣彼得堡,直到 1741 年,当时俄罗斯民族主义的兴起导致招募到俄罗斯的外国学者的条件恶化,应腓特烈大帝的邀请,欧拉跟随丹尼尔前往柏林学院。在那里,他继续在数学分析和微分学方面创作原创著作,并为当时数学和自然哲学的许多领域做出了贡献。
The Euler-Lagrange equation is just one of the many achievements of Leonhard Euler, the Swiss mathematician who, in the words of the great Laplace, “was the master of us all”. Indeed, although educated as a theologian and physician, and initially appointed in 1727 as an assistant in the medical department at the Imperial Russian Academy of Sciences, notre maître à tous quickly produced seminal discoveries in mathematics while serving Peter the Great in his desire for Russia to catch up to Western European science. He worked with Johann's son Daniel Bernoulli, and remained in St. Petersburg until 1741, when a rising Russian nationalism caused conditions to deteriorate for the foreign scholars recruited to Russia, and upon an invitation from Frederick the Great, Euler followed Daniel to the Berlin Academy. There he continued to produce original works in mathematical analysis and differential calculus and contributed to many areas of the mathematics and natural philosophy of the time.
欧拉的职责还包括辅导腓特烈的侄女,他关于许多不同主题的 200 封信后来被汇编成一本畅销书,名为《致德国公主的书信》。然而,即使是为侄女提供的这项伟大服务也没能让伟大的腓特烈相信欧拉的价值,他更喜欢欧拉的同事伏尔泰的诡辩,而不是欧拉的逻辑。腓特烈不喜欢欧拉,伏尔泰又嘲笑他,朴实无华的欧拉于 1766 年离开柏林返回圣彼得堡,叶卡捷琳娜大帝已在那里登基并重建了彼得学院。正是在那里,欧拉慢慢地走向失明,但仍然创作出不朽的数学著作,直到 1783 年去世,他一直勤奋工作。
Euler's duties also included tutoring Frederick's niece, and his 200 letters on many and various subjects were later compiled into a best-selling book entitled Lettres à une Princesse d’Allemagne. Alas, even this great service to his niece did not persuade the great Frederick of Euler's worth, who preferred Euler's fellow Academician Voltaire's sophistry to Euler's logic. Disfavored by Frederick and ridiculed by Voltaire, the homely and down-to-earth Euler left Berlin in 1766 to return to St. Petersburg where Catherine the Great had ascended the throne and resurrected Peter's Academy. It was there that Euler, slowly succumbing to blindness but still producing monumental mathematics, worked assiduously until his death in 1783.
所有大人物都想得到欧拉,希望他的才华能给他们带来声望,但他们永远无法真正欣赏欧拉,除非他们至少懂一点数学。腓特烈是典型的随和的欧拉,但他被轻浮的伏尔泰迷住了,伏尔泰浮夸地欺负这位谦逊的数学天才。
All the Greats wanted Euler in the hope that his brilliance might bring them prestige, but they could never really appreciate Euler unless they understood at least a modicum of mathematics. Frederick was typical, accommodating Euler, but charmed and won over by a flippant Voltaire who foppishly bullied the self-effacing mathematical genius.
约瑟夫-路易斯·拉格朗日这个名字听起来很法语的人其实是意大利人朱塞佩·洛多维科·路易吉·拉格朗日。他的开创性著作《分析力学》改变了平淡无奇的牛顿因果运动F = ma,将其变成了一个宏伟的自然目标,即通过最小化拉格朗日量(动能和势能之间的差值,可以表征任何物体的运动)来推导出所有的运动方程。
The man with the very French-sounding name of Joseph-Louis Lagrange was actually an Italian named Giuseppe Lodovico Luigi Lagrangia. His seminal work Méchanique Analytique changed the pedestrian Newtonian cause-and-effect motion of F = ma, to a grand natural purpose of deriving all the equations of motion through minimization of the Lagrangian, the difference between kinetic and potential energy that will characterize the motion of any object.
拉格朗日从都灵学院将他的力学著作的副本寄给了欧拉,欧拉竭力想把拉格朗日带到柏林学院,但直到他自己离开后才成功。腓特烈大帝想让“欧洲最伟大的数学家”取代欧拉,拉格朗日没有让人失望,因为他取得了许多成就,其中之一就是发现了两个大物体之间非常小的物体的稳定位置的拉格朗日点,例如太阳和地球之间的卫星;三体问题的一个特殊解。
From the Turin Academy, Lagrangia sent copies of his mechanics work to Euler who tried mightily to get Lagrangia to the Berlin Academy, but succeeded only after he himself had left. Frederick the Great wanted “the greatest mathematician in Europe” to replace Euler and Lagrangia did not disappoint, for among many other achievements, he found the Lagrangian Points of stable positions of very small bodies among two massive bodies, such as a satellite between the Sun and Earth; a special solution to the three-body problem.
在欧拉之前经历的同样压力下,拉格朗日于 1786 年离开柏林前往巴黎,恰逢法国大革命及其可怕的后果。如果不是他的朋友、现代化学之父安托万·拉瓦锡的干预,作为一名外国人,他即将在 1793 年被驱逐出法国。法国后来做出了补偿,在埃菲尔铁塔的一块牌匾上刻上了拉格朗日的名字,以及法国历史上其他伟人的名字。拉瓦锡就没那么幸运了,因为他的父亲是一名税务官收藏家,在新共和国对旧制度官员实行恐怖统治期间,拉瓦锡被罗伯斯庇尔打成叛徒,并于 1794 年被送上断头台。他为保全性命而提出的上诉被法官驳回,理由是:7
Under the same forces that Euler experienced before him, Lagrangia fatefully left Berlin for Paris in 1786, just in time for the Revolution and its horrific aftermath. As a foreigner, he was about to be expelled from France in 1793, if not for the intervention of his friend, Antoine Lavoisier, the father of modern chemistry. France made amends much later by honoring Lagrange with the inscription of his name, along with the other greats in the history of France, on a plaque on the Eiffel Tower. Lavoisier was not so fortunate, because his father had been a tax collector, during the new Republic's Reign of Terror against functionaries of the Ancien Régime, Lavoisier was branded a traitor by Robespierre and guillotined in 1794. An appeal to save his life was dismissed by the judge with the words:7
共和国不是化学家的学者,
司法法庭暂停审理
共和国不需要科学家,也不需要化学家,
正义的进程不容拖延
La Republique n’a pas besoin de savants ni de chimistes,
le cours de la justice ne peut être suspend
The Republic needs neither scientists nor chemists,
the course of justice cannot be delayed
他的朋友拉格朗日对拉瓦锡的命运表示悲痛,并以凄美的视角描述了这场悲剧,
His friend Lagrange, lamenting Lavoisier's fate put the tragedy in poignant perspective,
Cela leur a pris seulement un instant pour lui couper la tête, mais la France pourrait ne pas en produire une autre pareille en un siècle
他们只用了一瞬间就砍下了他的头,
但法国可能一个世纪内都不会再出现另一个这样的头像。
Cela leur a pris seulement un instant pour lui couper la tête, mais la France pourrait ne pas en produire une autre pareille en un siècle
It took them only an instant to cut off his head,
but France may not produce another such head in a century.